Re: [openstack-dev] [Nova] FFE Request: Oslo: i18n Message improvements
On 3/7/2014 4:15 AM, Sean Dague wrote: On 03/06/2014 04:46 PM, James Carey wrote: Please consider a FFE for i18n Message improvements: BP: https://blueprints.launchpad.net/nova/+spec/i18n-messages The base enablement for lazy translation has already been sync'd from oslo. This patch was to enable lazy translation support in Nova. It is titled re-enable lazy translation because this was enabled during Havana but was pulled due to issues that have since been resolved. In order to enable lazy translation it is necessary to do the following things: (1) Fix a bug in oslo with respect to how keywords are extracted from the format strings when saving replacement text for use when the message translation is done. This is https://bugs.launchpad.net/nova/+bug/1288049, which I'm actively working on a fix for in oslo. Once that is complete it will need to be sync'd into nova. (2) Remove concatenation (+) of translatable messages. The current class that is used to hold the translatable message (gettextutils.Message) does not support concatenation. There were a few cases in Nova where this was done and they are coverted to other means of combining the strings in: https://review.openstack.org/#/c/78095Remove use of concatenation on messages (3) Remove the use of str() on exceptions. The intent of this is to return the message contained in the exception, but these messages may contain unicode, so str cannot be used on them and gettextutils.Message enforces this. Thus these need to either be removed and allow python formatting to do the right thing, or changed to unicode(). Since unicode() will change to str() in Py3, the forward compatible six.text_type() is used instead. This is done in: https://review.openstack.org/#/c/78096Remove use of str() on exceptions (4) The addition of the call that enables the use of lazy messages. This is in: https://review.openstack.org/#/c/73706Re-enable lazy translation. Lazy translation has been enabled in the other projects so it would be beneficial to be consistent with the other projects with respect to message translation. Unless it has landed in *every other* integrated project besides Nova, I don't find this compelling. I have tested that the changes in (2) and (3) work when lazy translation is not enabled. Thus if a problem is found, the two line change in (4) could be removed to get to the previous behavior. I've been talking to Matt Riedemann and Dan Berrange about this. Matt has agreed to be a sponsor. If this is enabled in other projects, where is the Tempest scenario test that actually demonstrates that this is working on real installs? I get that everyone has features that didn't hit. BHowever now is not that time for that, now is the time for people to get focussed on bugs hunting. And especially if we are talking about *another* oslo sync. -1 -Sean ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev The Tempest requirement just came up yesterday. FWIW, this i18n stuff has been working it's way in since Grizzly, and the new requirement for Tempest is new. I'm not saying it's not valid, but the timing sucks - but that's life. Also, the oslo sync would be to one module, gettextutils, which I don't think pulls in anything else from oslo. Anyway, this is in Keystone, Glance, Cinder, Neutron and Ceilometer at least. Patches are working their way through Heat as I understand it. I'm not trying to turn this into a crusade, just trying to get out what I know about the current state of things. I'll let Jim Carey or Jay Bryant discuss it more since they've been more involved in the blueprints across all the projects. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][db] Thoughts on making instances.uuid non-nullable?
On 12/16/2013 11:01 AM, Shawn Hartsock wrote: +1 on a migration to make uuid a non-nullable column. I advocated a few patches back in Havana that make assumptions based on the UUID being present and unique per instance. If it gets nulled the VMware drivers will have have breakage and I have no idea how to avoid that reasonably without the UUID. On Mon, Dec 16, 2013 at 11:59 AM, Russell Bryant rbry...@redhat.com mailto:rbry...@redhat.com wrote: On 12/16/2013 11:45 AM, Matt Riedemann wrote: 1. Add a migration to change instances.uuid to non-nullable. Besides the obvious con of having yet another migration script, this seems the most straight-forward. The instance object class already defines the uuid field as non-nullable, so it's constrained at the objects layer, just not in the DB model. Plus I don't think we'd ever have a case where instance.uuid is null, right? Seems like a lot of things would break down if that happened. With this option I can build on top of it for the DB2 migration support to add the same FKs as the other engines. Yeah, having instance.uuid nullable doesn't seem valuable to me, so this seems OK. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- # Shawn.Hartsock - twitter: @hartsock - plus.google.com/+ShawnHartsock http://plus.google.com/+ShawnHartsock ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I've been working on this more and am running up against some issues, part of this has to do with my lack of sqlalchemy know-how and inexperience with writing DB migrations, so dumping some info/problems here to see where people would like me to take this. My original thinking for doing a migration was to delete the instances records where uuid == None and then move those to shadow_instances, then make instances.uuid.nullable=False. Some of the problems with this approach are: 1. There are at least 5 other tables related to instances that need to be handled for a delete: InstanceFault, InstanceMetadata, InstanceSystemMetadata, InstanceInfoCache and SecurityGroupInstanceAssociation. Also, these tables don't define their instance_uuid column the same way, some have it nullable=False and others don't. 2. I'm not sure if I can use a session in the migration to make it a transaction. 3. This would make the instances and shadow_instances tables have different schemas, i.e. instances.uuid would be nullable=False in instances but nullable=True in shadow_instances. Maybe this doesn't matter. The whole reason behind using shadow_instances (or any backup table I guess) was so I could restore the records on DB downgrade. So the more I think about this, I'm getting to the point of asking: 1. Do we even care about instances where uuid is None? I'd have to think those wouldn't be working well in the current code with how everything relies on uuid for foreign keys and tracking relationships to volumes, images and networks across services. If the answer is 'no' then the migration is pretty simple, just delete the records where uuid is None and be done with it. You couldn't downgrade to get them back, but in this case we're asserting that we don't want them back. 2. Have an alternative schema in the DB2 case. This would be handled in the 216_havana migration when the instances table is defined and created, we'd just make the uuid column non-nullable in the DB2 case and leave it nullable for all other engines. Anyone moving to DB2 would have to install from scratch anyway since there is no tooling to migrate a MySQL DB to DB2, for example. As it stands, the 216_havana migration in my patch [1] already has a different schema for DB2 because of the foreign keys it can't create due to this problem. Anyway, looking for some thoughts on how to best handle this, or if anyone has other ideas or good reasons why either approach couldn't be used. [1] https://review.openstack.org/#/c/69047/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][db] Thoughts on making instances.uuid non-nullable?
On 3/9/2014 9:05 PM, ChangBo Guo wrote: 2014-03-10 4:47 GMT+08:00 Jay Pipes jaypi...@gmail.com mailto:jaypi...@gmail.com: 3. This would make the instances and shadow_instances tables have different schemas, i.e. instances.uuid would be nullable=False in instances but nullable=True in shadow_instances. Maybe this doesn't matter. No, I don't think this matters much, to be honest. I'm not entirely sure what the long-term purpose of the shadow tables are in Nova -- perhaps someone could clue me in to whether the plan is to keep them around? As I know the tables shadow_* are used by command ' nova-manage db archive_deleted_rows' , which moves records with deleted=True to table shadow_* . That means these tables are used by other process, So, I think we need other tables to store the old records in your migration . So you mean move records where instances.uuid == None to shadow_instances? That's not going to work though if we make the uuid column non-nullable on both instances and shadow_instances, unless you generate a random UUID for the shadow_instances table records that get moved, which is just another hack - and that would break moving them back on downgrade since you wouldn't know which records to move back, i.e. since you wouldn't be able to query shadow_instances for records where instances.uuid == None. Other thoughts? If you did really want to back these records up, I think it would have to be a different backup table rather than shadow_instances since I think we want to keep the schema the same between instances and shadow_instances. -- ChangBo Guo(gcb) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][db] Thoughts on making instances.uuid non-nullable?
On 3/9/2014 9:18 PM, Jay Pipes wrote: On Mon, 2014-03-10 at 10:05 +0800, ChangBo Guo wrote: 2014-03-10 4:47 GMT+08:00 Jay Pipes jaypi...@gmail.com: 3. This would make the instances and shadow_instances tables have different schemas, i.e. instances.uuid would be nullable=False in instances but nullable=True in shadow_instances. Maybe this doesn't matter. No, I don't think this matters much, to be honest. I'm not entirely sure what the long-term purpose of the shadow tables are in Nova -- perhaps someone could clue me in to whether the plan is to keep them around? As I know the tables shadow_* are used by command ' nova-manage db archive_deleted_rows' , which moves records with deleted=True to table shadow_* . That means these tables are used by other process, So, I think we need other tables to store the old records in your migration. Yeah, that's what I understood the shadow tables were used for, I just didn't know what the long-term future of these tables was... curious if there's been any discussion about that. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I think Joe Gordon was working on something in the hopes of eventually killing the shadow tables but I can't remember exactly what that was now. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] FFE Request: Ephemeral RBD image support
On 3/10/2014 11:20 AM, Dmitry Borodaenko wrote: On Fri, Mar 7, 2014 at 8:55 AM, Sean Dague s...@dague.net wrote: On 03/07/2014 11:16 AM, Russell Bryant wrote: On 03/07/2014 04:19 AM, Daniel P. Berrange wrote: On Thu, Mar 06, 2014 at 12:20:21AM -0800, Andrew Woodward wrote: I'd Like to request A FFE for the remaining patches in the Ephemeral RBD image support chain https://review.openstack.org/#/c/59148/ https://review.openstack.org/#/c/59149/ are still open after their dependency https://review.openstack.org/#/c/33409/ was merged. These should be low risk as: 1. We have been testing with this code in place. 2. It's nearly all contained within the RBD driver. This is needed as it implements an essential functionality that has been missing in the RBD driver and this will become the second release it's been attempted to be merged into. Add me as a sponsor. OK, great. That's two. We have a hard deadline of Tuesday to get these FFEs merged (regardless of gate status). As alt release manager, FFE approved based on Russell's approval. The merge deadline for Tuesday is the release meeting, not end of day. If it's not merged by the release meeting, it's dead, no exceptions. Both commits were merged, thanks a lot to everyone who helped land this in Icehouse! Especially to Russel and Sean for approving the FFE, and to Daniel, Michael, and Vish for reviewing the patches! There was a bug reported today [1] that looks like a regression in this new code, so we need people involved in this looking at it as soon as possible because we have a proposed revert in case we need to yank it out [2]. [1] https://bugs.launchpad.net/nova/+bug/1291014 [2] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1291014,n,z -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] FFE Request: Ephemeral RBD image support
On 3/11/2014 3:11 PM, Jay Pipes wrote: On Tue, 2014-03-11 at 14:18 -0500, Matt Riedemann wrote: On 3/10/2014 11:20 AM, Dmitry Borodaenko wrote: On Fri, Mar 7, 2014 at 8:55 AM, Sean Dague s...@dague.net wrote: On 03/07/2014 11:16 AM, Russell Bryant wrote: On 03/07/2014 04:19 AM, Daniel P. Berrange wrote: On Thu, Mar 06, 2014 at 12:20:21AM -0800, Andrew Woodward wrote: I'd Like to request A FFE for the remaining patches in the Ephemeral RBD image support chain https://review.openstack.org/#/c/59148/ https://review.openstack.org/#/c/59149/ are still open after their dependency https://review.openstack.org/#/c/33409/ was merged. These should be low risk as: 1. We have been testing with this code in place. 2. It's nearly all contained within the RBD driver. This is needed as it implements an essential functionality that has been missing in the RBD driver and this will become the second release it's been attempted to be merged into. Add me as a sponsor. OK, great. That's two. We have a hard deadline of Tuesday to get these FFEs merged (regardless of gate status). As alt release manager, FFE approved based on Russell's approval. The merge deadline for Tuesday is the release meeting, not end of day. If it's not merged by the release meeting, it's dead, no exceptions. Both commits were merged, thanks a lot to everyone who helped land this in Icehouse! Especially to Russel and Sean for approving the FFE, and to Daniel, Michael, and Vish for reviewing the patches! There was a bug reported today [1] that looks like a regression in this new code, so we need people involved in this looking at it as soon as possible because we have a proposed revert in case we need to yank it out [2]. [1] https://bugs.launchpad.net/nova/+bug/1291014 [2] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1291014,n,z Note that I have identified the source of the problem and am pushing a patch shortly with unit tests. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev My concern is how much else where assumes nova is working with the glance v2 API because there was a nova blueprint [1] to make nova work with the glance V2 API but that never landed in Icehouse, so I'm worried about wack-a-mole type problems here, especially since there is no tempest coverage for testing multiple image location support via nova. [1] https://blueprints.launchpad.net/nova/+spec/use-glance-v2-api -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] FFE Request: Ephemeral RBD image support
On 3/11/2014 5:11 PM, Dmitry Borodaenko wrote: On Tue, Mar 11, 2014 at 1:31 PM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: There was a bug reported today [1] that looks like a regression in this new code, so we need people involved in this looking at it as soon as possible because we have a proposed revert in case we need to yank it out [2]. [1] https://bugs.launchpad.net/nova/+bug/1291014 [2] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1291014,n,z Note that I have identified the source of the problem and am pushing a patch shortly with unit tests. My concern is how much else where assumes nova is working with the glance v2 API because there was a nova blueprint [1] to make nova work with the glance V2 API but that never landed in Icehouse, so I'm worried about wack-a-mole type problems here, especially since there is no tempest coverage for testing multiple image location support via nova. [1] https://blueprints.launchpad.net/nova/+spec/use-glance-v2-api As I mentioned in the bug comments, the code that made the assumption about glance v2 API actually landed in September 2012: https://review.openstack.org/13017 The multiple image location patch simply made use of a method that was already there for more than a year. -DmitryB ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Yeah, I pointed that out today in IRC also. So kudos to Jay for getting a patch up quickly, and a really nice one at that with extensive test coverage. What I'd like to see in Juno is a tempest test that covers the multiple image locations code since it seems we obviously don't have that today. How hard is something like that with an API test? -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] An analysis of code review in Nova
this later and will add unit tests then' or 'it's hard to test this path without a lot of changes to how the tests are working'. That's unacceptable to me, and I generally give up on the review after that. So to move this all forward, I think that bp above should be top priority for the vmware team in Juno to keep bp patches moving at the pace they do, because the features and refactoring just keeps coming and at least for me it's very hard to burn out on looking at those reviews. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] FFE Request: Ephemeral RBD image support
On 3/12/2014 6:32 PM, Dan Smith wrote: I'm confused as to why we arrived at the decision to revert the commits since Jay's patch was accepted. I'd like some details about this decision, and what new steps we need to take to get this back in for Juno. Jay's fix resolved the immediate problem that was reported by the user. However, after realizing why the bug manifested itself and why it didn't occur during our testing, all of the core members involved recommended a revert as the least-risky course of action at this point. If it took almost no time for that change to break a user that wasn't even using the feature, we're fearful about what may crop up later. We talked with the patch author (zhiyan) in IRC for a while after making the decision to revert about what the path forward for Juno is. The tl;dr as I recall is: 1. Full Glance v2 API support merged 2. Tests in tempest and nova that exercise Glance v2, and the new feature 3. Push the feature patches back in --Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Those are essentially the steps as I remember them too. Sean changed the dependencies in the blueprints so the nova glance v2 blueprint is the root dependency, then multiple images and then the other download handler blueprints at the top. I haven't checked but the blueprints should be marked as not complete (not sure what that would be now) and marked for next, the v2 glance root blueprint should be marked as high priority too so we get the proper focus when Juno opens up. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] An analysis of code review in Nova
On 3/12/2014 7:29 PM, Arnaud Legendre wrote: Hi Matt, I totally agree with you and actually we have been discussing this a lot internally the last few weeks. . As a top priority, the driver MUST integrate with oslo.vmware. This will be achieved through this chain of patches [1]. We want these patches to be merged before other things. I think we should stop introducing more complexity which makes the task of refactoring more and more complicated. The integration with oslo.vmware is not a refactoring but should be seen as a way to get a more lightweight version of the driver which will make the task of refactoring a bit easier. . Then, we want to actually refactor, we got several meetings to know what is the best strategy to adopt going forward (and avoid reproducing the same mistakes). The highest priority is spawn(): we need to make it modular, remove nested methods. This refactoring work should include the integration with the image handler framework [2] and introducing the notion of image type object to avoid all these conditions on types of images inside the core logic. Breaking up the spawn method to make it modular and thus testable or refactoring to use oslo.vmware, order there doesn't seem to really matter to me since both sound good. But this scares me: This refactoring work should include the integration with the image handler framework Hopefully the refactoring being talked about here with oslo.vmware and breaking spawn into chunks can be done *before* any work to refactor the vmware driver to use the multiple image locations feature - it will probably have to be given that was reverted out of Icehouse and will have some prerequisite work to do before it will land in Juno. . I would like to see you cores to be involved in this design since you will be reviewing the code at some point. involved here can be interpreted as review the design, and/ or actually participate to the design discussions. I would like to get your POV on this. Let me know if this approach makes sense. Thanks, Arnaud [1] https://review.openstack.org/#/c/70175/ [2] https://review.openstack.org/#/c/33409/ - Original Message - From: Matt Riedemann mrie...@linux.vnet.ibm.com To: openstack-dev@lists.openstack.org Sent: Wednesday, March 12, 2014 11:28:23 AM Subject: Re: [openstack-dev] [nova] An analysis of code review in Nova On 2/25/2014 6:36 AM, Matthew Booth wrote: I'm new to Nova. After some frustration with the review process, specifically in the VMware driver, I decided to try to visualise how the review process is working across Nova. To that end, I've created 2 graphs, both attached to this mail. Both graphs show a nova directory tree pruned at the point that a directory contains less than 2% of total LOCs. Additionally, /tests and /locale are pruned as they make the resulting graph much busier without adding a great deal of useful information. The data for both graphs was generated from the most recent 1000 changes in gerrit on Monday 24th Feb 2014. This includes all pending changes, just over 500, and just under 500 recently merged changes. pending.svg shows the percentage of LOCs which have an outstanding change against them. This is one measure of how hard it is to write new code in Nova. merged.svg shows the average length of time between the ultimately-accepted version of a change being pushed and being approved. Note that there are inaccuracies in these graphs, but they should be mostly good. Details of generation here: https://urldefense.proofpoint.com/v1/url?u=https://github.com/mdbooth/heatmapk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=5wWaXo2oVaivfKLCMyU6Z9UTO8HOfeGCzbGHAT4gZpo%3D%0Am=q%2BhYPEq%2BGxlDrGrMdbYCWuaLhZOwXwRpMQwWxkSied4%3D%0As=9a9e8ba562a81e0d00ca4190fbda306617637473ba5e721e4071d8d0ae20175c. This code is obviously single-purpose, but is free for re-use if anyone feels so inclined. The first graph above (pending.svg) is the one I was most interested in, and shows exactly what I expected it to. Note the size of 'vmwareapi'. If you check out Nova master, 24% of the vmwareapi driver has an outstanding change against it. It is practically impossible to write new code in vmwareapi without stomping on an oustanding patch. Compare that to the libvirt driver at a much healthier 3%. The second graph (merged.svg) is an attempt to look at why that is. Again comparing the VMware driver with the libvirt we can see that at 12 days, it takes much longer for a change to be approved in the VMware driver than in the libvirt driver. I suspect that this isn't the whole story, which is likely a combination of a much longer review time with very active development. What's the impact of this? As I said above, it obviously makes it very hard to come in as a new developer of the VMware driver when almost a quarter of it has been rewritten, but you can't see it. I am very new to this and others should validate my conclusions, but I also believe this is having a detrimental
Re: [openstack-dev] Duplicate code for processing REST APIs
On 3/13/2014 4:13 PM, Roman Podoliaka wrote: Hi Steven, Code from openstack/common/ dir is 'synced' from oslo-incubator. The 'sync' is effectively a copy of oslo-incubator subtree into a project source tree. As syncs are not done at the same time, the code of synced modules may indeed by different for each project depending on which commit of oslo-incubator was synced. Thanks, Roman On Thu, Mar 13, 2014 at 2:03 PM, Steven Kaufer kau...@us.ibm.com wrote: While investigating some REST API updates I've discovered that there is a lot of duplicated code across the various OpenStack components. For example, the paginate_query function exists in all these locations and there are a few slight differences between most of them: https://github.com/openstack/ceilometer/blob/master/ceilometer/openstack/common/db/sqlalchemy/utils.py#L61 https://github.com/openstack/cinder/blob/master/cinder/openstack/common/db/sqlalchemy/utils.py#L37 https://github.com/openstack/glance/blob/master/glance/openstack/common/db/sqlalchemy/utils.py#L64 https://github.com/openstack/heat/blob/master/heat/openstack/common/db/sqlalchemy/utils.py#L62 https://github.com/openstack/keystone/blob/master/keystone/openstack/common/db/sqlalchemy/utils.py#L64 https://github.com/openstack/neutron/blob/master/neutron/openstack/common/db/sqlalchemy/utils.py#L61 https://github.com/openstack/nova/blob/master/nova/openstack/common/db/sqlalchemy/utils.py#L64 Does anyone know if there is any work going on to move stuff like this into oslo and then deprecate these functions? There are also many functions that process the REST API request parameters (getting the limit, marker, sort data, etc.) that are also replicated across many components. If no existing work is done in this area, how should this be tackled? As a blueprint for Juno? Thanks, Steven Kaufer Cloud Systems Software kau...@us.ibm.com 507-253-5104 Dept HMYS / Bld 015-2 / G119 / Rochester, MN 55901 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Steve, more info here on oslo-incubator: https://wiki.openstack.org/wiki/Oslo#Incubation Welcome! :) -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Constructive Conversations
On 3/7/2014 1:56 PM, Kurt Griffiths wrote: Folks, I’m sure that I’m not the first person to bring this up, but I’d like to get everyone’s thoughts on what concrete actions we, as a community, can take to improve the status quo. There have been a variety of instances where community members have expressed their ideas and concerns via email or at a summit, or simply submitted a patch that perhaps challenges someone’s opinion of The Right Way to Do It, and responses to that person have been far less constructive than they could have been[1]. In an open community, I don’t expect every person who comments on a ML post or a patch to be congenial, but I do expect community leaders to lead by example when it comes to creating an environment where every person’s voice is valued and respected. What if every time someone shared an idea, they could do so without fear of backlash and bullying? What if people could raise their concerns without being summarily dismissed? What if “seeking first to understand”[2] were a core value in our culture? It would not only accelerate our pace of innovation, but also help us better understand the needs of our cloud users, helping ensure we aren’t just building OpenStack in the right way, but also building /the right OpenStack/. We need open minds to build an open cloud. Many times, we /do/ have wonderful, constructive discussions, but the times we don’t cause wounds in the community that take a long time to heal. Psychologists tell us that it takes a lot of good experiences to make up for one bad one. I will be the first to admit I’m not perfect. Communication is hard. But I’m convinced we can do better. We /must/ do better. How can we build on what is already working, and make the bad experiences as rare as possible? A few ideas to seed the discussion: * Identify a set of core values that the community already embraces for the most part, and put them down “on paper.”[3] Leaders can keep these values fresh in everyone’s minds by (1) leading by example, and (2) referring to them regularly in conversations and talks. * PTLs can add mentoring skills and a mindset of seeking first to understand” to their list of criteria for evaluating proposals to add a community member to a core team. * Get people together in person, early and often. Mid-cycle meetups and mini-summits provide much higher-resolution communication channels than email and IRC, and are great ways to clear up misunderstandings, build relationships of trust, and generally get everyone pulling in the same direction. What else can we do? Kurt [1] There are plenty of examples, going back years. Anyone who has been in the community very long will be able to recall some to mind. Recent ones I thought of include Barbican’s initial request for incubation on the ML, dismissive and disrespectful exchanges in some of the design sessions in Hong Kong (bordering on personal attacks), and the occasional “WTF?! This is the dumbest idea ever!” patch comment. [2] https://www.stephencovey.com/7habits/7habits-habit5.php [3] We already have a code of conduct https://www.openstack.org/legal/community-code-of-conduct/ but I think a list of core values would be easier to remember and allude to in day-to-day discussions. I’m trying to think of ways to make this idea practical. We need to stand up for our values, not just /say/ we have them. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Not to detract from what you're saying, but this is 'meh' to me. My company has some different kind of values thing every 6 months it seems and maybe it's just me but I never really pay attention to any of it. I think I have to put something on my annual goals/results about it, but it's just fluffy wording. To me this is a self-policing community, if someone is being a dick, the others should call them on it, or the PTL for the project should stand up against it and set the tone for the community and culture his project wants to have. That's been my experience at least. Maybe some people would find codifying this helpful, but there are already lots of wikis and things that people can't remember on a daily basis so adding another isn't probably going to help the problem. Bully's don't tend to care about codes, but if people stand up against them in public they should be outcast. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack-dev] [Nova] use Keystone V3 token to volume attachment
On 3/19/2014 2:48 AM, Shao Kai SK Li wrote: Hello: I am working on this patch(https://review.openstack.org/#/c/77524/) to fix bugs about volume attach failure with keystone V3 token. Just wonder, is there some blue prints or plans in Juno to address keystone V3 support in nova ? Thanks you in advance. Best Regards~~~ Li, Shaokai ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I have this on the nova meeting agenda for tomorrow [1]. I would think at a minimum this means running compute tests in Tempest against a keystone v3 backend. I'm not sure what the current state of Tempest is regarding keystone v3. Note that this isn't the only thing that made it into nova in Icehouse related to keystone v3 [2]. [1] https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting [2] https://review.openstack.org/69972 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack-dev] [Nova] use Keystone V3 token to volume attachment
On 3/19/2014 10:02 AM, Matthew Treinish wrote: On Wed, Mar 19, 2014 at 09:35:34AM -0500, Matt Riedemann wrote: On 3/19/2014 2:48 AM, Shao Kai SK Li wrote: Hello: I am working on this patch(https://review.openstack.org/#/c/77524/) to fix bugs about volume attach failure with keystone V3 token. Just wonder, is there some blue prints or plans in Juno to address keystone V3 support in nova ? Thanks you in advance. Best Regards~~~ Li, Shaokai ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I have this on the nova meeting agenda for tomorrow [1]. I would think at a minimum this means running compute tests in Tempest against a keystone v3 backend. I'm not sure what the current state of Tempest is regarding keystone v3. Note that this isn't the only thing that made it into nova in Icehouse related to keystone v3 [2]. On the tempest side there are some dedicated keystone v3 api tests, I'm not sure how well things are covered there though. On using keystone v3 for auth for the other tests tempest doesn't quite support that yet. Andrea Frittoli is working on a bp to get this working: https://blueprints.launchpad.net/tempest/+spec/multi-keystone-api-version-tests But, at this point it will probably end up being early Juno thing before this can be enabled everywhere in tempest. -Matt Treinish ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Furthermore Russell talked to Dolph in IRC and Dolph created this blueprint for planning the path forward from keystone v2 to v3: https://blueprints.launchpad.net/keystone/+spec/document-v2-to-v3-transition -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] An analysis of code review in Nova
On 3/22/2014 5:19 AM, Shawn Hartsock wrote: On Fri, Mar 14, 2014 at 6:58 PM, Dan Smith d...@danplanet.com wrote: Review latency will be directly affected by how good the refactoring changes are staged. If they are small, on-topic and easy to validate, they will go quickly. They should be linearized unless there are some places where multiple sequences of changes make sense (i.e. refactoring a single file that results in no changes required to others). I'm going to bring this to the next https://wiki.openstack.org/wiki/Meetings/VMwareAPI we can start working on how we'll set the order for this kind of work. Currently we have a whole blueprint for refactoring a single method. That seems silly. I'll want to come up with a plan around how to restructure the driver so we can avoid some of the messes we've seen in the past. I think the point of starting with refactoring the nested method mess in the spawn method was it (1) seemed relatively trivial (think fast review turnaround) and (2) should be a good bang for the buck kind of change, as a lot of the original complaint was related to how hard it is to verify changes in the giant spawn method are tested - which you also point out below. I want to avoid one big refactor effort that drags on, but I also want to address bigger problems we have inside the driver. For example, I also want to avoid a big refactor effort dragging on, and I like the thinking on design changes, but are they doing going to be happening at the same time? Or is the complete re-design going to supersede the refactoring? My only concern is biting off more than can be chewed in juno-1. Plus there is the refactor to use oslo.vmware, where does that fit into this? vm_util.py seems to have become burdened with work that it shouldn't have. It also performs a great number of unnecessary round trips using a vm_ref to pull individual vm details over one at a time. Introducing a VirtualMachine object that held all these references would simplify some operations (I'm not the first person to suggest this and it wasn't novel to me either when it was presented.) It would seem Juno-1 would be the time to make these changes and we need to serialize this work to keep reviewers from losing their marbles trying to track it all. I would like to work out a plan for this in conjunction with interested core-reviewers who would be willing to more or less sponsor this work. Because of the problems Matt points out, I don't want to tackle this in a haphazard or piece-meal way since it could completely disrupt any new blueprint work people may have targeted for Juno. Yeah, definitely need a plan here. I'd like to see things prioritized based on what can be fixed in a relatively isolated way which gives a good return on coding/reviewing investment, e.g. pulling those nested methods out of spawn so they can be unit tested individually with mock rather than a large, seemingly rigid and scaffolded test framework. Having said that, on this driver, new blueprints in the last several cycles have introduced serious feature regressions. Several minor bug fixes have altered and introduced key architectural components that have broken multiple critical features. In my professional opinion this has a root cause based on the drivers tightly coupled and non-cohesive internal design. The driver design is tightly coupled in that a change in one area forces multiple updates and changes *throughout* the rest of the driver. This is true in testing as well. The testing design often requires you to trace the entire codebase if you add a single optional parameter to a single method. This does not have to be true. Yup, this is my major complaint and ties into what I'm saying above, I find it really difficult to determine most of the time where a change is tested. Because of the nature of the driver code and my lack of actually writing features in it, as a reviewer I don't know if a change in X is going to break Y, so I rely on solid test coverage and the testing needs to be more natural to follow than it currently is. The driver design is non-cohesive in that important details and related information is spread throughout the driver. You must be aware at all times (for example) whether or not your current operation requires you to check if your vm_ref is outdated (we just worked on several last minute critical bugs for RC1 where myself and others pulled all nighters to fix the issue in a bad case of Heroic Programming). I would like to stop the http://c2.com/cgi/wiki?CodeVomit please. May we? I know this isn't going to be easy so I'm really glad you're planning on tackling it in Juno. I'll tentatively sign up to help sponsor this but I'm not going to be able to commit all of my review bandwidth to a ton of changes for refactor and re-design. Hopefully targets will become more clear as the team gets the plans in place. -- Thanks, Matt Riedemann
Re: [openstack-dev] [qa] [neutron] Neutron Full Parallel job - Last 4 days failures
On 3/27/2014 8:00 AM, Salvatore Orlando wrote: On 26 March 2014 19:19, James E. Blair jebl...@openstack.org mailto:jebl...@openstack.org wrote: Salvatore Orlando sorla...@nicira.com mailto:sorla...@nicira.com writes: On another note, we noticed that the duplicated jobs currently executed for redundancy in neutron actually seem to point all to the same build id. I'm not sure then if we're actually executing each job twice or just duplicating lines in the jenkins report. Thanks for catching that, and I'm sorry that didn't work right. Zuul is in fact running the jobs twice, but it is only looking at one of them when sending reports and (more importantly) decided whether the change has succeeded or failed. Fixing this is possible, of course, but turns out to be a rather complicated change. Since we don't make heavy use of this feature, I lean toward simply instantiating multiple instances of identically configured jobs and invoking them (eg neutron-pg-1, neutron-pg-2). Matthew Treinish has already worked up a patch to do that, and I've written a patch to revert the incomplete feature from Zuul. That makes sense to me. I think it is just a matter about the results are reported to gerrit since from what I gather in logstash the jobs are executed twice for each new patchset or recheck. For the status of the full job, I gave a look at the numbers reported by Rossella. All the bugs are already known; some of them are not even bug; others have been recently fixed (given the time span of Rossella analysis and the fact it covers also non-rebased patches it might be possible to have this kind of false positive). of all full job failures, 44% should be discarded. Bug 1291611 (12%) is definitely not a neutron bug... hopefully. Bug 1281969 (12%) is really too generic. It bears the hallmark of bug1283522, and therefore the high number might be due to the fact that trunk was plagued by this bug up to a few days before the analysis. However, it's worth noting that there is also another instance of lock timeout which has caused 11 failures in full job in the past week. A new bug has been filed for this issue: https://bugs.launchpad.net/neutron/+bug/1298355 Bug 1294603 was related to a test now skipped. It is still being debated whether the problem lies in test design, neutron LBaaS or neutron L3. The following bugs seem not to be neutron bugs: 1290642, 1291920, 1252971, 1257885 Bug 1292242 appears to have been fixed while the analysis was going on Bug 1277439 instead is already known to affects neutron jobs occasionally. The actual state of the job is perhaps better than what the raw numbers say. I would keep monitoring it, and then make it voting after the Icehouse release is cut, so that we'll be able to deal with possible higher failure rate in the quiet period of the release cycle. -Jim ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I reported this bug [1] yesterday. This was hit in our internal Tempest runs on RHEL 6.5 with x86_64 and the nova libvirt driver with the neutron openvswitch ML2 driver. We're running without tenant isolation on python 2.6 (no testr yet) so the tests are in serial. We're running basically the full API/CLI/Scenarios tests though, no filtering on the smoke tag. Out of 1,971 tests run, we had 3 failures where a nova instance failed to spawn because networking callback events failed, i.e. neutron sends a server event request to nova and it's a bad URL so nova API pukes and then the networking request in neutron server fails. As linked in the bug report I'm seeing the same neutron server log error showing up in logstash for community jobs but it's not 100% failure. I haven't seen the n-api log error show up in logstash though. Just bringing this to people's attention in case anyone else sees it. [1] https://bugs.launchpad.net/nova/+bug/1298640 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Looking for clarification on the diagnostics API
Tempest recently got some new tests for the nova diagnostics API [1] which failed when I was running against the powervm driver since it doesn't implement that API. I started looking at other drivers that did and found that libvirt, vmware and xenapi at least had code for the get_diagnostics method. I found that the vmware driver was re-using it's get_info method for get_diagnostics which led to bug 1237622 [2] but overall caused some confusion about the difference between the compute driver's get_info and get_diagnostics mehods. It looks like get_info is mainly just used to get the power_state of the instance. First, the get_info method has a nice docstring for what it needs returned [3] but the get_diagnostics method doesn't [4]. From looking at the API docs [5], the diagnostics API basically gives an example of values to get back which is completely based on what the libvirt driver returns. Looking at the xenapi driver code, it looks like it does things a bit differently than the libvirt driver (maybe doesn't return the exact same keys, but it returns information based on what Xen provides). I'm thinking about implementing the diagnostics API for the powervm driver but I'd like to try and get some help on defining just what should be returned from that call. There are some IVM commands available to the powervm driver for getting hardware resource information about an LPAR so I think I could implement this pretty easily. I think it basically comes down to providing information about the processor, memory, storage and network interfaces for the instance but if anyone has more background information on that API I'd like to hear it. [1] https://github.com/openstack/tempest/commit/da0708587432e47f85241201968e6402190f0c5d [2] https://bugs.launchpad.net/nova/+bug/1237622 [3] https://github.com/openstack/nova/blob/2013.2.rc1/nova/virt/driver.py#L144 [4] https://github.com/openstack/nova/blob/2013.2.rc1/nova/virt/driver.py#L299 [5] http://paste.openstack.org/show/48236/ Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States image/gif___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Looking for clarification on the diagnostics API
Looks like this has been brought up a couple of times: https://lists.launchpad.net/openstack/msg09138.html https://lists.launchpad.net/openstack/msg08555.html But they seem to kind of end up in the same place I already am - it seems to be an open-ended API that is hypervisor-specific. Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Matt Riedemann/Rochester/IBM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/10/2013 02:12 PM Subject:[nova] Looking for clarification on the diagnostics API Tempest recently got some new tests for the nova diagnostics API [1] which failed when I was running against the powervm driver since it doesn't implement that API. I started looking at other drivers that did and found that libvirt, vmware and xenapi at least had code for the get_diagnostics method. I found that the vmware driver was re-using it's get_info method for get_diagnostics which led to bug 1237622 [2] but overall caused some confusion about the difference between the compute driver's get_info and get_diagnostics mehods. It looks like get_info is mainly just used to get the power_state of the instance. First, the get_info method has a nice docstring for what it needs returned [3] but the get_diagnostics method doesn't [4]. From looking at the API docs [5], the diagnostics API basically gives an example of values to get back which is completely based on what the libvirt driver returns. Looking at the xenapi driver code, it looks like it does things a bit differently than the libvirt driver (maybe doesn't return the exact same keys, but it returns information based on what Xen provides). I'm thinking about implementing the diagnostics API for the powervm driver but I'd like to try and get some help on defining just what should be returned from that call. There are some IVM commands available to the powervm driver for getting hardware resource information about an LPAR so I think I could implement this pretty easily. I think it basically comes down to providing information about the processor, memory, storage and network interfaces for the instance but if anyone has more background information on that API I'd like to hear it. [1] https://github.com/openstack/tempest/commit/da0708587432e47f85241201968e6402190f0c5d [2] https://bugs.launchpad.net/nova/+bug/1237622 [3] https://github.com/openstack/nova/blob/2013.2.rc1/nova/virt/driver.py#L144 [4] https://github.com/openstack/nova/blob/2013.2.rc1/nova/virt/driver.py#L299 [5] http://paste.openstack.org/show/48236/ Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States image/gifimage/gif___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
Based on the discussion with Russell and Dan Smith in the nova meeting today, here are some of my notes from the meeting that can continue the discussion. These are all pretty rough at the moment so please bear with me, this is more to just get the ball rolling on ideas. Notes on powervm CI: 1. What OS to run on? Fedora 19, RHEL 6.4? - Either of those is probably fine, we use RHEL 6.4 right now internally. 2. Deployment - RDO? SmokeStack? Devstack? - SmokeStack is preferable since it packages rpms which is what we're using internally. 3. Backing database - mysql or DB2 10.5? - Prefer DB2 since that's what we want to support in Icehouse and it's what we use internally, but there are differences in how long it takes to create a database with DB2 versus MySQL so when you multiply that times 7 databases (keystone, cinder, glance, nova, heat, neutron, ceilometer) it's going to add up unless we can figure out a better way to do it (single database with multiple schemas?). Internally we use a pre-created image with the DB2 databases already created, we just run the migrate scripts against them so we don't have to wait for the create times every run - would that fly in community? 4. What is the max amount of time for us to report test results? Dan didn't seem to think 48 hours would fly. :) 5. What are the minimum tests that need to run (excluding APIs that the powervm driver doesn't currently support)? - smoke/gate/negative/whitebox/scenario/cli? Right now we have 1152 tempest tests running, those are only within api/scenario/cli and we don't run everything. 6. Network service? We're running with openvswitch 1.10 today so we probably want to continue with that if possible. 7. Cinder backend? We're running with the storwize driver but we do we do about the remote v7000? Again, just getting some thoughts out there to help us figure out our goals for this, especially around 4 and 5. Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States image/gif___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Hyper-V] Havana status
Getting integration testing hooked up for the hyper-v driver with tempest should go a long way here which is a good reason to have it. As has been mentioned, there is a core team of people that understand the internals of the hyper-v driver and the subtleties of when it won't work, and only those with a vested interest in using it will really care about it. My team has the same issue with the powervm driver. We don't have community integration testing hooked up yet. We run tempest against it internally so we know what works and what doesn't, but besides standard code review practices that apply throughout everything (strong unit test coverage, consistency with other projects, hacking rules, etc), any other reviewer has to generally take it on faith that what's in there works as it's supposed to. Sure, there is documentation available on what the native commands do and anyone can dig into those to figure it out, but I wouldn't expect that low-level of review from anyone that doesn't regularly work on the powervm driver. I think the same is true for anything here. So the equalizer is a rigorously tested and broad set of integration tests, which is where we all need to get to with tempest and continuous integration. We've had the same issues as mentioned in the original note about things slipping out of releases or taking a long time to get reviewed, and we've had to fork code internally because of it which we then have to continue to try and get merged upstream - and it's painful, but it is what it is, that's the nature of the business. Personally my experience has been that the more I give the more I get. The more I'm involved in what others are doing and the more I review other's code, the more I can build a relationship which is mutually beneficial. Sometimes I can only say 'hey, you need unit tests for this or this doesn't seem right but I'm not sure', but unless you completely automate code coverage metrics and build that back into reviews, e.g. does your 1000 line blueprint have 95% code coverage in the tests, you still need human reviewers on everything, regardless of context. Even then it's not going to be enough, there will always be a need for people with a broader vision of the project as a whole that can point out where things are going in the wrong direction even if it fixes a bug. The point is I see both sides of the argument, I'm sure many people do. In a large complicated project like this it's inevitable. But I think the quality and adoption of OpenStack speaks for itself and I believe a key component of that is the review system and that's only as good as the people which are going to uphold the standards across the project. I've been on enough development projects that give plenty of lip service to code quality and review standards which are always the first thing to go when a deadline looms, and those projects are always ultimately failures. Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Tim Smith tsm...@gridcentric.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/10/2013 07:48 PM Subject:Re: [openstack-dev] [Hyper-V] Havana status On Thu, Oct 10, 2013 at 1:50 PM, Russell Bryant rbry...@redhat.com wrote: Please understand that I only want to help here. Perhaps a good way for you to get more review attention is get more karma in the dev community by helping review other patches. It looks like you don't really review anything outside of your own stuff, or patches that touch hyper-v. In the absence of significant interest in hyper-v from others, the only way to get more attention is by increasing your karma. NB: I don't have any vested interest in this discussion except that I want to make sure OpenStack stays Open, i.e. inclusive. I believe the concept of reviewer karma, while seemingly sensible, is actually subtly counter to the goals of openness, innovation, and vendor neutrality, and would also lead to overall lower commit quality. Brian Kernighan famously wrote: Debugging is twice as hard as writing the code in the first place. A corollary is that constructing a mental model of code is hard; perhaps harder than writing the code in the first place. It follows that reviewing code is not an easy task, especially if one has not been intimately involved in the original development of the code under review. In fact, if a reviewer is not intimately familiar with the code under review, and therefore only able to perform the functions of human compiler and style-checker (functions which can be and typically are performed by automatic tools), the rigor of their review is at best less-than-ideal, and at worst purely symbolic. It is logical, then, that a reviewer should review
Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
Dan Smith d...@danplanet.com wrote on 10/10/2013 08:26:14 PM: From: Dan Smith d...@danplanet.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/10/2013 08:31 PM Subject: Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI 4. What is the max amount of time for us to report test results? Dan didn't seem to think 48 hours would fly. :) Honestly, I think that 12 hours during peak times is the upper limit of what could be considered useful. If it's longer than that, many patches could go into the tree without a vote, which defeats the point. Yeah, I was just joking about the 48 hour thing, 12 hours seems excessive but I guess that has happened when things are super backed up with gate issues and rechecks. Right now things take about 4 hours, with Tempest being around 1.5 hours of that. The rest of the time is setup and install, which includes heat and ceilometer. So I guess that raises another question, if we're really setting this up right now because of nova, do we need to have heat and ceilometer installed and configured in the initial delivery of this if we're not going to run tempest tests against them (we don't right now)? I think some aspect of the slow setup time is related to DB2 and how the migrations perform with some of that, but the overall time is not considerably different from when we were running this with MySQL so I'm reluctant to blame it all on DB2. I think some of our topology could have something to do with it too since the IVM hypervisor is running on a separate system and we are gated on how it's performing at any given time. I think that will be our biggest challenge for the scale issues with community CI. 5. What are the minimum tests that need to run (excluding APIs that the powervm driver doesn't currently support)? - smoke/gate/negative/whitebox/scenario/cli? Right now we have 1152 tempest tests running, those are only within api/scenario/cli and we don't run everything. I think that a full run of tempest should be required. That said, if there are things that the driver legitimately doesn't support, it makes sense to exclude those from the tempest run, otherwise it's not useful. I think you should publish the tempest config (or config script, or patch, or whatever) that you're using so that we can see what it means in terms of the coverage you're providing. Just to clarify, do you mean publish what we are using now or publish once it's all working? I can certainly attach our nose.cfg and latest x-unit results xml file. 6. Network service? We're running with openvswitch 1.10 today so we probably want to continue with that if possible. Hmm, so that means neutron? AFAIK, not much of tempest runs with Nova/Neutron. I kinda think that since nova-network is our default right now (for better or worse) that the run should include that mode, especially if using neutron excludes a large portion of the tests. I think you said you're actually running a bunch of tempest right now, which conflicts with my understanding of neutron workiness. Can you clarify? Correct, we're running with neutron using the ovs plugin. We basically have the same issues that the neutron gate jobs have, which is related to concurrency issues and tenant isolation (we're doing the same as devstack with neutron in that we don't run tempest with tenant isolation). We are running most of the nova and most of the neutron API tests though (we don't have all of the neutron-dependent scenario tests working though, probably more due to incompetence in setting up neutron than anything else). 7. Cinder backend? We're running with the storwize driver but we do we do about the remote v7000? Is there any reason not to just run with a local LVM setup like we do in the real gate? I mean, additional coverage for the v7000 driver is great, but if it breaks and causes you to not have any coverage at all, that seems, like, bad to me :) Yeah, I think we'd just run with a local LVM setup, that's what we do for x86_64 and s390x tempest runs. For whatever reason we thought we'd do storwize for our ppc64 runs, probably just to have a matrix of coverage. Again, just getting some thoughts out there to help us figure out our goals for this, especially around 4 and 5. Yeah, thanks for starting this discussion! --Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] dd performance for wipe in cinder
Have you looked at the volume_clear and volume_clear_size options in cinder.conf? https://github.com/openstack/cinder/blob/2013.2.rc1/etc/cinder/cinder.conf.sample#L1073 The default is to zero out the volume. You could try 'none' to see if that helps with performance. Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: cosmos cosmos cosmos0...@gmail.com To: openstack-dev@lists.openstack.org, Date: 10/11/2013 04:26 AM Subject:[openstack-dev] dd performance for wipe in cinder Hello. My name is Rucia for Samsung SDS. Now I am in trouble in cinder volume deleting. I am developing for supporting big data storage in lvm But it takes too much time for deleting of cinder lvm volume because of dd. Cinder volume is 200GB for supporting hadoop master data. When i delete cinder volume in using 'dd if=/dev/zero of $cinder-volume count=100 bs=1M' it takes about 30 minutes. Is there the better and quickly way for deleting? Cheers. Rucia. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev image/gif___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Hyper-V] Havana status
I'd like to see the powervm driver fall into that first category. We don't nearly have the rapid development that the hyper-v driver does, but we do have some out of tree stuff anyway simply because it hasn't landed upstream yet (DB2, config drive support for the powervm driver, etc), and maintaining that out of tree code is not fun. So I definitely don't want to move out of tree. Given that, I think at least I'm trying to contribute overall [1][2] by doing reviews outside my comfort zone, bug triage, fixing bugs when I can, and because we run tempest in house (with neutron-openvswitch) we find issues there that I get to push patches for. Having said all that, it's moot for the powervm driver if we don't get the CI hooked up in Icehouse and I completely understand that so it's a top priority. [1] http://stackalytics.com/?release=havanametric=commitsproject_type=openstackmodule=company=user_id=mriedem [2] https://review.openstack.org/#/q/reviewer:6873+project:openstack/nova,n,z Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Russell Bryant rbry...@redhat.com To: openstack-dev@lists.openstack.org, Date: 10/11/2013 11:33 AM Subject:Re: [openstack-dev] [Hyper-V] Havana status On 10/11/2013 12:04 PM, John Griffith wrote: On Fri, Oct 11, 2013 at 9:12 AM, Bob Ball bob.b...@citrix.com mailto:bob.b...@citrix.com wrote: -Original Message- From: Russell Bryant [mailto:rbry...@redhat.com mailto:rbry...@redhat.com] Sent: 11 October 2013 15:18 To: openstack-dev@lists.openstack.org mailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Hyper-V] Havana status As a practical example for Nova: in our case that would simply include the following subtrees: nova/virt/hyperv and nova/tests/virt/hyperv. If maintainers of a particular driver would prefer this sort of autonomy, I'd rather look at creating new repositories. I'm completely open to going that route on a per-driver basis. Thoughts? I think that all drivers that are officially supported must be treated in the same way. If we are going to split out drivers into a separate but still official repository then we should do so for all drivers. This would allow Nova core developers to focus on the architectural side rather than how each individual driver implements the API that is presented. Of course, with the current system it is much easier for a Nova core to identify and request a refactor or generalisation of code written in one or multiple drivers so they work for all of the drivers - we've had a few of those with XenAPI where code we have written has been pushed up into Nova core rather than the XenAPI tree. Perhaps one approach would be to re-use the incubation approach we have; if drivers want to have the fast-development cycles uncoupled from core reviewers then they can be moved into an incubation project. When there is a suitable level of integration (and automated testing to maintain it of course) then they can graduate. I imagine at that point there will be more development of new features which affect Nova in general (to expose each hypervisor's strengths), so there would be fewer cases of them being restricted just to the virt/* tree. Bob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I've thought about this in the past, but always come back to a couple of things. Being a community driven project, if a vendor doesn't want to participate in the project then why even pretend (ie having their own project/repo, reviewers etc). Just post your code up in your own github and let people that want to use it pull it down. If it's a vendor project, then that's fine; have it be a vendor project. In my opinion pulling out and leaving things up to the vendors as is being described has significant negative impacts. Not the least of which is consistency in behaviors. On the Cinder side, the core team spends the bulk of their review time looking at things like consistent behaviors, missing features or paradigms that are introduced that break other drivers. For example looking at things like, are all the base features implemented, do they work the same way, are we all using the same vocabulary, will it work in an multi-backend environment. In addition, it's rare that a vendor implements a new feature in their driver that doesn't impact/touch the core code somewhere. Having
Re: [openstack-dev] [nova] Looking for clarification on the diagnostics API
There is also a tempest patch now to ease some of the libvirt-specific keys checked in the new diagnostics tests there: https://review.openstack.org/#/c/51412/ To relay some of my concerns that I put in that patch: I'm not sure how I feel about this. It should probably be more generic but I think we need more than just a change in tempest to enforce it, i.e. we should have a nova patch that changes the doc strings for the abstract compute driver method to specify what the minimum keys are for the info returned, maybe a doc api sample change, etc? For reference, here is the mailing list post I started on this last week: http://lists.openstack.org/pipermail/openstack-dev/2013-October/016385.html There are also docs here (these examples use xen and libvirt): http://docs.openstack.org/grizzly/openstack-compute/admin/content/configuring-openstack-compute-basics.html And under procedure 4.4 here: http://docs.openstack.org/admin-guide-cloud/content/ch_introduction-to-openstack-compute.html#section_manage-the-cloud = I also found this wiki page related to metering and the nova diagnostics API: https://wiki.openstack.org/wiki/EfficientMetering/FutureNovaInteractionModel So it seems like if at some point this will be used with ceilometer it should be standardized a bit which is what the Tempest part starts but I don't want it to get lost there. Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Gary Kotton gkot...@vmware.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/12/2013 01:42 PM Subject:Re: [openstack-dev] [nova] Looking for clarification on the diagnostics API Yup, it seems to be hypervisor specific. I have added in the Vmware support following you correcting in the Vmware driver. Thanks Gary From: Matt Riedemann mrie...@us.ibm.com Reply-To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: Thursday, October 10, 2013 10:17 PM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [nova] Looking for clarification on the diagnostics API Looks like this has been brought up a couple of times: https://lists.launchpad.net/openstack/msg09138.html https://lists.launchpad.net/openstack/msg08555.html But they seem to kind of end up in the same place I already am - it seems to be an open-ended API that is hypervisor-specific. Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From:Matt Riedemann/Rochester/IBM To:OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date:10/10/2013 02:12 PM Subject:[nova] Looking for clarification on the diagnostics API Tempest recently got some new tests for the nova diagnostics API [1] which failed when I was running against the powervm driver since it doesn't implement that API. I started looking at other drivers that did and found that libvirt, vmware and xenapi at least had code for the get_diagnostics method. I found that the vmware driver was re-using it's get_info method for get_diagnostics which led to bug 1237622 [2] but overall caused some confusion about the difference between the compute driver's get_info and get_diagnostics mehods. It looks like get_info is mainly just used to get the power_state of the instance. First, the get_info method has a nice docstring for what it needs returned [3] but the get_diagnostics method doesn't [4]. From looking at the API docs [5], the diagnostics API basically gives an example of values to get back which is completely based on what the libvirt driver returns. Looking at the xenapi driver code, it looks like it does things a bit differently than the libvirt driver (maybe doesn't return the exact same keys, but it returns information based on what Xen provides). I'm thinking about implementing the diagnostics API for the powervm driver but I'd like to try and get some help on defining just what should be returned from that call. There are some IVM commands available to the powervm driver for getting hardware resource information about an LPAR so I think I could implement this pretty easily. I think it basically comes down to providing information about the processor, memory, storage and network interfaces for the instance but if anyone has more background information on that API I'd like to hear it. [1] https://github.com/openstack/tempest/commit/da0708587432e47f85241201968e6402190f0c5d [2] https://bugs.launchpad.net/nova/+bug/1237622 [3] https://github.com/openstack/nova/blob/2013.2.rc1/nova/virt/driver.py#L144 [4
Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
I just opened this bug, it's going to be one of the blockers for us to get PowerVM CI going in Icehouse: https://bugs.launchpad.net/nova/+bug/1241619 Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Matt Riedemann/Rochester/IBM@IBMUS To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/11/2013 10:59 AM Subject:Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI Matthew Treinish mtrein...@kortar.org wrote on 10/10/2013 10:31:29 PM: From: Matthew Treinish mtrein...@kortar.org To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/10/2013 11:07 PM Subject: Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI On Thu, Oct 10, 2013 at 07:39:37PM -0700, Joe Gordon wrote: On Thu, Oct 10, 2013 at 7:28 PM, Matt Riedemann mrie...@us.ibm.com wrote: 4. What is the max amount of time for us to report test results? Dan didn't seem to think 48 hours would fly. :) Honestly, I think that 12 hours during peak times is the upper limit of what could be considered useful. If it's longer than that, many patches could go into the tree without a vote, which defeats the point. Yeah, I was just joking about the 48 hour thing, 12 hours seems excessive but I guess that has happened when things are super backed up with gate issues and rechecks. Right now things take about 4 hours, with Tempest being around 1.5 hours of that. The rest of the time is setup and install, which includes heat and ceilometer. So I guess that raises another question, if we're really setting this up right now because of nova, do we need to have heat and ceilometer installed and configured in the initial delivery of this if we're not going to run tempest tests against them (we don't right now)? In general the faster the better, and if things get to slow enough that we have to wait for powervm CI to report back, I think its reasonable to go ahead and approve things without hearing back. In reality if you can report back in under 12 hours this will rarely happen (I think). I think some aspect of the slow setup time is related to DB2 and how the migrations perform with some of that, but the overall time is not considerably different from when we were running this with MySQL so I'm reluctant to blame it all on DB2. I think some of our topology could have something to do with it too since the IVM hypervisor is running on a separate system and we are gated on how it's performing at any given time. I think that will be our biggest challenge for the scale issues with community CI. 5. What are the minimum tests that need to run (excluding APIs that the powervm driver doesn't currently support)? - smoke/gate/negative/whitebox/scenario/cli? Right now we have 1152 tempest tests running, those are only within api/scenario/cli and we don't run everything. Well that's almost a full run right now, the full tempest jobs have 1290 tests of which we skip 65 because of bugs or configuration. (don't run neutron api tests without neutron) That number is actually pretty high since you are running with neutron. Right now the neutron gating jobs only have 221 jobs and skip 8 of those. Can you share the list of things you've got working with neutron so we can up the number of gating tests? Here is the nose.cfg we run with: Some of the tests are excluded because of performance issues that still need to be worked out (like test_list_image_filters - it works but it takes over 20 minutes sometimes). Some of the tests are excluded because of limitations with DB2, e.g. test_list_servers_filtered_by_name_wildcard Some of them are probably old excludes on bugs that are now fixed. We have to go back through what's excluded every once in awhile to figure out what's still broken and clean things up. Here is the tempest.cfg we use on ppc64: And here are the xunit results from our latest run: Note that we have known issues with some cinder and neutron failures in there. I think that a full run of tempest should be required. That said, if there are things that the driver legitimately doesn't support, it makes sense to exclude those from the tempest run, otherwise it's not useful. ++ I think you should publish the tempest config (or config script, or patch, or whatever) that you're using so that we can see what it means in terms of the coverage you're providing. Just to clarify, do you mean publish what we are using now or publish once it's all working? I can certainly attach
Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
And this guy: https://bugs.launchpad.net/nova/+bug/1241628 Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Matt Riedemann/Rochester/IBM@IBMUS To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/18/2013 09:25 AM Subject:Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI I just opened this bug, it's going to be one of the blockers for us to get PowerVM CI going in Icehouse: https://bugs.launchpad.net/nova/+bug/1241619 Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From:Matt Riedemann/Rochester/IBM@IBMUS To:OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date:10/11/2013 10:59 AM Subject:Re: [openstack-dev] [nova][powervm] my notes from the meeting onpowervm CI Matthew Treinish mtrein...@kortar.org wrote on 10/10/2013 10:31:29 PM: From: Matthew Treinish mtrein...@kortar.org To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/10/2013 11:07 PM Subject: Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI On Thu, Oct 10, 2013 at 07:39:37PM -0700, Joe Gordon wrote: On Thu, Oct 10, 2013 at 7:28 PM, Matt Riedemann mrie...@us.ibm.com wrote: 4. What is the max amount of time for us to report test results? Dan didn't seem to think 48 hours would fly. :) Honestly, I think that 12 hours during peak times is the upper limit of what could be considered useful. If it's longer than that, many patches could go into the tree without a vote, which defeats the point. Yeah, I was just joking about the 48 hour thing, 12 hours seems excessive but I guess that has happened when things are super backed up with gate issues and rechecks. Right now things take about 4 hours, with Tempest being around 1.5 hours of that. The rest of the time is setup and install, which includes heat and ceilometer. So I guess that raises another question, if we're really setting this up right now because of nova, do we need to have heat and ceilometer installed and configured in the initial delivery of this if we're not going to run tempest tests against them (we don't right now)? In general the faster the better, and if things get to slow enough that we have to wait for powervm CI to report back, I think its reasonable to go ahead and approve things without hearing back. In reality if you can report back in under 12 hours this will rarely happen (I think). I think some aspect of the slow setup time is related to DB2 and how the migrations perform with some of that, but the overall time is not considerably different from when we were running this with MySQL so I'm reluctant to blame it all on DB2. I think some of our topology could have something to do with it too since the IVM hypervisor is running on a separate system and we are gated on how it's performing at any given time. I think that will be our biggest challenge for the scale issues with community CI. 5. What are the minimum tests that need to run (excluding APIs that the powervm driver doesn't currently support)? - smoke/gate/negative/whitebox/scenario/cli? Right now we have 1152 tempest tests running, those are only within api/scenario/cli and we don't run everything. Well that's almost a full run right now, the full tempest jobs have 1290 tests of which we skip 65 because of bugs or configuration. (don't run neutron api tests without neutron) That number is actually pretty high since you are running with neutron. Right now the neutron gating jobs only have 221 jobs and skip 8 of those. Can you share the list of things you've got working with neutron so we can up the number of gating tests? Here is the nose.cfg we run with: Some of the tests are excluded because of performance issues that still need to be worked out (like test_list_image_filters - it works but it takes over 20 minutes sometimes). Some of the tests are excluded because of limitations with DB2, e.g. test_list_servers_filtered_by_name_wildcard Some of them are probably old excludes on bugs that are now fixed. We have to go back through what's excluded every once in awhile to figure out what's still broken and clean things up. Here is the tempest.cfg we use on ppc64: And here are the xunit results from our latest run: Note that we have known issues with some cinder and neutron failures in there. I think
Re: [openstack-dev] [Neutron] IPv6 DHCP options for dnsmasq
FWIW, we've wanted IPv6 support too but there are limitations in sqlalchemy and python 2.6 and since openstack is still supporting both of those, we are gated on that. Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Sean M. Collins s...@coreitpro.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/22/2013 10:33 AM Subject:Re: [openstack-dev] [Neutron] IPv6 DHCP options for dnsmasq On Tue, Oct 22, 2013 at 08:58:52AM +0200, Luke Gorrie wrote: Deutsche Telekom too. We are working on making Neutron interoperate well with a service provider network that's based on IPv6. I look forward to talking about this with people in Hong Kong :) I may be mistaken, but I don't see a summit proposal for Neutron, on the subject of IPv6. Are there plans to have one? -- Sean M. Collins [attachment att18car.dat deleted by Matt Riedemann/Rochester/IBM] ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev image/gif___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Openstack on power pc/Freescale linux
We run openstack on ppc64 with RHEL 6.4 using the powervm nova virt driver. What do you want to know? Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Qing He qing...@radisys.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/22/2013 05:49 PM Subject:[openstack-dev] [nova] Openstack on power pc/Freescale linux All, I'm wondering if anyone tried OpenStack on Power PC/ free scale Linux? Thanks, Qing ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev image/gif___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Openstack on power pc/Freescale linux
Yeah, my team does. We're using openvswitch 1.10, qpid 0.22, DB2 10.5 (but MySQL also works). Do you have specific issues/questions? We're working on getting continuous integration testing working for the nova powervm driver in the icehouse release, so you can see some more details about what we're doing with openstack on power in this thread: http://lists.openstack.org/pipermail/openstack-dev/2013-October/016395.html Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Qing He qing...@radisys.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/22/2013 07:43 PM Subject:Re: [openstack-dev] [nova] Openstack on power pc/Freescale linux Thanks Matt. I’d like know if anyone has tried to run the controller, API server and MySql database, msg queue, etc—the brain of the openstack, on ppc. Qing From: Matt Riedemann [mailto:mrie...@us.ibm.com] Sent: Tuesday, October 22, 2013 4:17 PM To: OpenStack Development Mailing List Subject: Re: [openstack-dev] [nova] Openstack on power pc/Freescale linux We run openstack on ppc64 with RHEL 6.4 using the powervm nova virt driver. What do you want to know? Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From:Qing He qing...@radisys.com To:OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date:10/22/2013 05:49 PM Subject:[openstack-dev] [nova] Openstack on power pc/Freescale linux All, I'm wondering if anyone tried OpenStack on Power PC/ free scale Linux? Thanks, Qing ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev image/gifimage/gif___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [sqlalchemy-migrate] Blueprint for review: Add DB2 10.5 Support
I've got a sqlalchemy-migrate blueprint up for review to add DB2 support in migrate. https://blueprints.launchpad.net/sqlalchemy-migrate/+spec/add-db2-support This is a pre-req for getting DB2 support into Nova so I'm targeting icehouse-1. We've been running with the migrate patches internally since Folsom, but getting them into migrate was difficult before OpenStack took over maintenance of the project. Please let me know if there are any questions/issues or something I need to address here. Thanks, Matt Riedemann Cloud Solutions and OpenStack Development Email: mrie...@us.ibm.com Office Phone: 507-253-7622___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Do we have some guidelines for mock, stub, mox when writing unit test?
I don't see anything explicit in the wiki and hacking guides, they mainly just say to have unit tests for everything and tell you how to run/debug them. Generally mock is supposed to be used over mox now for python 3 support. There is also a blueprint to remove the usage of mox in neutron: https://blueprints.launchpad.net/neutron/+spec/remove-mox For all new patches, we should be using mock over mox because of the python 3 support of mock (and lack thereof for mox). As for when to use mock vs stubs, I think you'll get different opinions from different people. Stubs are quick and easy and that's what I used early when I started contributing to the project, but since then have preferred mox/mock since they validate that methods are actually called with specific parameters, which can get lost when simply stubbing a method call out. In other words, if I'm stubbing a method and doing assertions within it (which you'll usually see), if that method is never called (maybe the code changed since the test was written), the assertions are lost and the test is essentially broken. So I think in general it's best to use mock now unless you have a good reason not to. On 11/10/2013 7:40 AM, Jay Lau wrote: Hi, I noticed that we are now using mock, mox and stub for unit test, just curious do we have any guidelines for this, in which condition shall we use mock, mox or stub? Thanks, Jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] sqlalchemy-migrate needs a new release
I don't know what's all involved in putting out a release for sqlalchemy-migrate but if there is a way that I can help, please let me know. I'll try to catch dripton in IRC today. As for CI with DB2, it's in the blueprint as a work item, I just don't know enough about the infra side of things to get that going, so I'd need some help there. DB2 Express-C is the free version which is the plan to run the unit tests in CI, but the only problem I see with that is it's a trial license and I wouldn't want to have to redo images or licenses every 3 months or however long it lasts. I would think that IBM would be able to provide a permanent license for CI though, otherwise our alternative is running the tests in-house and reporting the results back (something like what the nova virt drivers have to do and vmware is already doing). Thanks, Matt Riedemann On 11/12/2013 1:50 AM, Roman Podoliaka wrote: Hey David, Thank you for undertaking this task! I agree, that merging of DB2 support can be postponed for now, even if it looks totally harmless (though I see no way to test it, as we don't have DB2 instances running on Infra test nodes). Thanks, Roman On Mon, Nov 11, 2013 at 10:54 PM, Davanum Srinivas dava...@gmail.com wrote: @dripton, @Roman Many thanks :) On Mon, Nov 11, 2013 at 3:35 PM, David Ripton drip...@redhat.com wrote: On 11/11/2013 11:37 AM, Roman Podoliaka wrote: As you may know, in our global requirements list [1] we are currently depending on SQLAlchemy 0.7.x versions (which is 'old stable' branch and will be deprecated soon). This is mostly due to the fact, that the latest release of sqlalchemy-migrate from PyPi doesn't support SQLAlchemy 0.8.x+. At the same time, distros have been providing patches for fixing this incompatibility for a long time now. Moreover, those patches have been merged to sqlalchemy-migrate master too. As we are now maintaining sqlalchemy-migrate, we could make a new release of it. This would allow us to bump the version of SQLAlchemy release we are depending on (as soon as we fix all the bugs we have) and let distros maintainers stop carrying their own patches. This has been discussed at the design summit [2], so we just basically need a volunteer from [3] Gerrit ACL group to make a new release. Is sqlalchemy-migrate stable enough to make a new release? I think, yes. The commits we've merged since we adopted this library, only fix a few issues with SQLAlchemy 0.8.x compatibility and enable running of tests (we are currently testing all new changes on py26/py27, SQLAlchemy 0.7.x/0.8.x, SQLite/MySQL/PostgreSQL). Who wants to help? :) Thanks, Roman [1] https://github.com/openstack/requirements/blob/master/global-requirements.txt [2] https://etherpad.openstack.org/p/icehouse-oslo-db-migrations [3] https://review.openstack.org/#/admin/groups/186,members I'll volunteer to do this release. I'll wait 24 hours from the timestamp of this email for input first. So, if anyone has opinions about the timing of this release, please speak up. (In particular, I'd like to do a release *before* Matt Riedermann's DB2 support patch https://review.openstack.org/#/c/55572/ lands, just in case it breaks anything. Of course we could do another release shortly after it gets in, to make folks who use DB2 happy.) -- David Ripton Red Hat drip...@redhat.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Davanum Srinivas :: http://davanum.wordpress.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Do we have some guidelines for mock, stub, mox when writing unit test?
On 11/12/2013 5:04 PM, Chuck Short wrote: On Tue, Nov 12, 2013 at 4:49 PM, Mark McLoughlin mar...@redhat.com mailto:mar...@redhat.com wrote: On Tue, 2013-11-12 at 16:42 -0500, Chuck Short wrote: Hi On Tue, Nov 12, 2013 at 4:24 PM, Mark McLoughlin mar...@redhat.com mailto:mar...@redhat.com wrote: On Tue, 2013-11-12 at 13:11 -0800, Shawn Hartsock wrote: Maybe we should have some 60% rule... that is: If you change more than half of a test... you should *probably* rewrite the test in Mock. A rule needs a reasoning attached to it :) Why do we want people to use mock? Is it really for Python3? If so, I assume that means we've ruled out the python3 port of mox? (Ok by me, but would be good to hear why) And, if that's the case, then we should encourage whoever wants to port mox based tests to mock. The upstream maintainer is not going to port mox to python3 so we have a fork of mox called mox3. Ideally, we would drop the usage of mox in favour of mock so we don't have to carry a forked mox. Isn't that the opposite conclusion you came to here: http://lists.openstack.org/pipermail/openstack-dev/2013-July/012474.html i.e. using mox3 results in less code churn? Mark. Yes that was my original position but I though we agreed in thread (further on) that we would use mox3 and then migrate to mock further on. Regards chuck ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev So it sounds like we're good with using mox for new tests again? Given Chuck got it into global-requirements here: https://github.com/openstack/requirements/commit/998dda263d7c7881070e3f16e4523ddcd23fc36d We can stave off the need to transition everything from mox to mock? I can't seem to find the nova blueprint to convert everything from mox to mock, maybe it was obsoleted already. Anyway, if mox(3) is OK and we don't need to use mock, it seems like we could add something to the developer guide here because I think this question comes up frequently: http://docs.openstack.org/developer/nova/devref/unit_tests.html Does anyone disagree? BTW, I care about this because I've been keeping in mind the mox/mock transition when doing code reviews and giving a -1 when new tests are using mox (since I thought that was a no-no now). -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [sqlalchemy-migrate] Blueprint for review: Add DB2 10.5 Support
Joe, Hey, I missed this question. I moved email accounts for the openstack-dev mailing list and missed this in my old pile. So I touched on this a bit in response here [1] and also a bit when talking about the plans for CI for the nova PowerVM virt driver here [2]. The blueprint for adding DB2 support to sqlalchemy-migrate and the DB2 enablement wiki [3] does call out CI. Getting the sqlalchemy-migrate unit tests to run against DB2 isn't that hard, I just haven't figured out if it's something I can do with community infrastructure or running as an external third party test, and I think whether we use Express-C or not would matter there since that has a trial license. I'm open to suggestions/comments/ideas. [1] http://lists.openstack.org/pipermail/openstack-dev/2013-November/018714.html [2] http://lists.openstack.org/pipermail/openstack-dev/2013-October/016395.html [3] https://wiki.openstack.org/wiki/DB2Enablement -- Thanks, Matt Riedemann From: Joe Gordon joe.gord...@gmail.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 11/07/2013 09:41 PM Subject: Re: [openstack-dev] [sqlalchemy-migrate] Blueprint for review: Add DB2 10.5 Support With OpenStacks test and gating oriented mindset, how can we gate on this functionality working going forward? On Fri, Nov 1, 2013 at 3:30 AM, Matt Riedemann _mrie...@us.ibm.com_ mailto:mrie...@us.ibm.com wrote: I've got a sqlalchemy-migrate blueprint up for review to add DB2 support in migrate. _ __https://blueprints.launchpad.net/sqlalchemy-migrate/+spec/add-db2-support_ This is a pre-req for getting DB2 support into Nova so I'm targeting icehouse-1. We've been running with the migrate patches internally since Folsom, but getting them into migrate was difficult before OpenStack took over maintenance of the project. Please let me know if there are any questions/issues or something I need to address here. Thanks, Matt Riedemann Cloud Solutions and OpenStack Development Email: _mrie...@us.ibm.com_ mailto:mrie...@us.ibm.com Office Phone: _507-253-7622_ tel:507-253-7622 ___ OpenStack-dev mailing list_ __OpenStack-dev@lists.openstack.org_ mailto:OpenStack-dev@lists.openstack.org_ __http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev_ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [sqlalchemy-migrate] Blueprint for review: Add DB2 10.5 Support
On 11/14/2013 10:38 PM, Matt Riedemann wrote: Joe, Hey, I missed this question. I moved email accounts for the openstack-dev mailing list and missed this in my old pile. So I touched on this a bit in response here [1] and also a bit when talking about the plans for CI for the nova PowerVM virt driver here [2]. The blueprint for adding DB2 support to sqlalchemy-migrate and the DB2 enablement wiki [3] does call out CI. Getting the sqlalchemy-migrate unit tests to run against DB2 isn't that hard, I just haven't figured out if it's something I can do with community infrastructure or running as an external third party test, and I think whether we use Express-C or not would matter there since that has a trial license. I'm open to suggestions/comments/ideas. [1] http://lists.openstack.org/pipermail/openstack-dev/2013-November/018714.html [2] http://lists.openstack.org/pipermail/openstack-dev/2013-October/016395.html [3] https://wiki.openstack.org/wiki/DB2Enablement Thanks to Brant Bknudson for pointing out that DB2 Express-C doesn't have a time restriction: http://www.ibm.com/developerworks/downloads/im/db2express/ It is a fully licensed product available for free download. It does not have any time restrictions. I must have mistaken that with Enterprise Server Edition that we were using in house for some bigger deployments for CI with Tempest. So it sounds like Express-C is what we could use to get sqlalchemy-migrate unit tests running against DB2 using the community infrastructure (I hope), I just need some help with getting that going. I know Roman got the migrate UT running for MySQL and PostgreSQL here: https://review.openstack.org/#/c/40436/ I'll try working with Roman, Monty and any infra guys that will talk to me to get this going. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [sqlalchemy-migrate] Blueprint for review: Add DB2 10.5 Support
On 11/15/2013 10:15 AM, Matt Riedemann wrote: On 11/14/2013 10:38 PM, Matt Riedemann wrote: Joe, Hey, I missed this question. I moved email accounts for the openstack-dev mailing list and missed this in my old pile. So I touched on this a bit in response here [1] and also a bit when talking about the plans for CI for the nova PowerVM virt driver here [2]. The blueprint for adding DB2 support to sqlalchemy-migrate and the DB2 enablement wiki [3] does call out CI. Getting the sqlalchemy-migrate unit tests to run against DB2 isn't that hard, I just haven't figured out if it's something I can do with community infrastructure or running as an external third party test, and I think whether we use Express-C or not would matter there since that has a trial license. I'm open to suggestions/comments/ideas. [1] http://lists.openstack.org/pipermail/openstack-dev/2013-November/018714.html [2] http://lists.openstack.org/pipermail/openstack-dev/2013-October/016395.html [3] https://wiki.openstack.org/wiki/DB2Enablement Thanks to Brant Bknudson for pointing out that DB2 Express-C doesn't have a time restriction: http://www.ibm.com/developerworks/downloads/im/db2express/ It is a fully licensed product available for free download. It does not have any time restrictions. I must have mistaken that with Enterprise Server Edition that we were using in house for some bigger deployments for CI with Tempest. So it sounds like Express-C is what we could use to get sqlalchemy-migrate unit tests running against DB2 using the community infrastructure (I hope), I just need some help with getting that going. I know Roman got the migrate UT running for MySQL and PostgreSQL here: https://review.openstack.org/#/c/40436/ I'll try working with Roman, Monty and any infra guys that will talk to me to get this going. Just to circle back on this before anyone throws in their two cents and tells me that 3rd party CI is the way to go, I caught Monty in IRC and came to that conclusion already. While DB2 Express-C is free and doesn't expire, it's closed source so it's an issue of the infra team being able to maintain it, and that there is no closed source code running in the community infrastructure. So I'll plan on getting the sqlalchemy-migrate unit tests reporting back for DB2 using 3rd party CI and triggers. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Split of the openstack-dev list (summary so far)
of nova-network? (3 messages) [openstack-dev] [Nova] New API requirements, review of GCE (6 messages) [openstack-dev] how can I know a new instance is created from the code ? (3 messages) [openstack-dev] [Nova] Icehouse Blueprints (2 messages) [openstack-dev] [Solum/Heat] Is Solum really necessary? (14 messages) [openstack-dev] Nova XML serialization bug 1223358 moving discussion here to get more people involved (4 messages) [openstack-dev] [RFC] Straw man to start the incubation / graduation requirements discussion (11 messages) [openstack-dev] [Savanna] DiskBuilder / savanna-image-elements (4 messages) [openstack-dev] [Keystone] Blob in keystone v3 certificate API (2 messages) [openstack-dev] [oslo] team meeting Friday 15 November @ 14:00 UTC (2 messages) [openstack-dev] [Trove][Savanna][Murano] Unified Agent proposal discussion at Summit (6 messages) [openstack-dev] [oslo] tracking graduation status for incubated code [openstack-dev] [OpenStack-dev][Neutron][Tempest]Can Tempest embrace some complicated network scenario tests (3 messages) [openstack-dev] [nova][cinder][oslo][scheduler] How to leverage oslo schduler/filters for nova and cinder (6 messages) [openstack-dev] [Nova] Hypervisor CI requirement and deprecation plan [openstack-dev] [Ceilometer] compute agent cannot start (7 messages) [openstack-dev] [Horizon] Use icon set instead of instance Action (4 messages) [openstack-dev] [OpenStack][Horizon] poweroff/shutdown action in horizon (3 messages) [openstack-dev] [Murano] Implementing Elastic Applications (3 messages) Now - tell me in the above list where the mass of StackForge related email overwhelming madness is coming from. I count 4 topics and 26 messages out of a total of 44 topics and 328 messages. So - before we take the extreme move of segregation, can we just try threaded mail readers for a while and see if it helps? Monty ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Thanks for the tip Monty. I just started using Thunderbird last week and already had my tags sorting most of the dev list into folders, but just installed the Conversations add-on to further clean things up. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [stable/havana] gate broken
On Sunday, November 17, 2013 7:46:39 AM, Gary Kotton wrote: Hi, The gating for the stable version is broken when the running the neutron gate. Locally this works but the gate has problem. All of the services are up and running correctly. There are some exceptions with the ceilometer service but that is not related to the neutron gating. The error message is as follows: 2013-11-17 11:00:05.855 http://logs.openstack.org/46/56746/1/check/check-tempest-devstack-vm-neutron/a02894b/console.html#_2013-11-17_11_00_05_855 | 2013-11-17 11:00:05 2013-11-17 11:00:17.239 http://logs.openstack.org/46/56746/1/check/check-tempest-devstack-vm-neutron/a02894b/console.html#_2013-11-17_11_00_17_239 | Process leaked file descriptors. Seehttp://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build for more information 2013-11-17 11:00:17.437 http://logs.openstack.org/46/56746/1/check/check-tempest-devstack-vm-neutron/a02894b/console.html#_2013-11-17_11_00_17_437 | Build step 'Execute shell' marked build as failure 2013-11-17 11:00:19.129 http://logs.openstack.org/46/56746/1/check/check-tempest-devstack-vm-neutron/a02894b/console.html#_2013-11-17_11_00_19_129 | [SCP] Connecting to static.openstack.org Thanks Gary ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I've seen this fail on at least two stable/havana patches in nova today, so I opened this bug: https://bugs.launchpad.net/openstack-ci/+bug/1252024 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] [api] How to handle bug 1249526?
This is mainly just a newbie question but looks like it could be an easy fix. The bug report is just asking for the nova os-fixed-ips API extension to return the 'reserved' status for the fixed IP. I don't see that in the v3 API list though, was that dropped in V3? If it's not being ported to V3 I'm sure there was a good reason so maybe this isn't worth implementing in the V2 API, even though it seems like a pretty harmless backwards compatible change. Am I missing something here? -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [infra] How to determine patch set load for a given project
We have a team working on getting CI setup for DB2 10.5 in sqlalchemy-migrate and they were asking me if there was a way to calculate the patch load through that project. I asked around in the infra IRC channel and Jeremy Stanley pointed out that there might be something available in http://graphite.openstack.org/ by looking for the project's test stats. I found that if you expand stats_counts zuul job and then search for your project (sqlalchemy-migrate in this case), you can find the jobs and their graphs for load. In my case I care about stats for gate-sqlalchemy-migrate-python27. I'm having a little trouble interpreting the data though. From looking at what's out there for review now, there is one new patch created on 11/19 and the last new one before that was on 11/15. I see spikes in the graph around 11/15, 11/18 and 11/19, but I'm not sure what the 11/18 spike is from? -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] VNC issue with multi compute node with openstack havana
On Wednesday, November 20, 2013 7:49:50 AM, Vikash Kumar wrote: Hi, I used devstack Multi-Node + VLANs to install openstack-havana recently. Installation was successful and i verified basic things like vm launch, ping between vm's. I have two nodes: 1. Ctrl+Compute 2. Compute The VM which gets launched on second compute node (here 2, see above) doesn't gets vnc console. I tried to acces from both horizon and the url given by nova-cli. The *n-novnc* screen on first node which is controller (here 1) gave this error log: Traceback (most recent call last): File /usr/local/lib/python2.7/dist-packages/websockify/websocket.py, line 711, in top_new_client self.new_client() File /opt/stack/nova/nova/console/websocketproxy.py, line 68, in new_client tsock = self.socket(host, port, connect=True) File /usr/local/lib/python2.7/dist-packages/websockify/websocket.py, line 188, in socket sock.connect(addrs[0][4]) File /usr/local/lib/python2.7/dist-packages/eventlet/greenio.py, line 192, in connect socket_checkerr(fd) File /usr/local/lib/python2.7/dist-packages/eventlet/greenio.py, line 46, in socket_checkerr raise socket.error(err, errno.errorcode[err]) error: [Errno 111] ECONNREFUSED The vnc related configuration in nova.conf on Ctrl+Compute node: vncserver_proxyclient_address = 127.0.0.1 vncserver_listen = 127.0.0.1 vnc_enabled = true xvpvncproxy_base_url = http://192.168.2.151:6081/console novncproxy_base_url = http://192.168.2.151:6080/vnc_auto.html and on second Compute node: /* I corrected the I.P. of first two address, by default it sets to 127.0.0.1 */ vncserver_proxyclient_address = 192.168.2.157 vncserver_listen = 0.0.0.0 vnc_enabled = true xvpvncproxy_base_url = http://192.168.2.151:6081/console novncproxy_base_url = http://192.168.2.151:6080/vnc_auto.html I also added the host name of compute node in hosts file of controllernode. With this ERORR 111 gone and new error came. connecting to: 192.168.2.157:-1 7: handler exception: [Errno -8] Servname not supported for ai_socktype 7: Traceback (most recent call last): File /usr/local/lib/python2.7/dist-packages/websockify/websocket.py, line 711, in top_new_client self.new_client() File /opt/stack/nova/nova/console/websocketproxy.py, line 68, in new_client tsock = self.socket(host, port, connect=True) File /usr/local/lib/python2.7/dist-packages/websockify/websocket.py, line 180, in socket socket.IPPROTO_TCP, flags) gaierror: [Errno -8] Servname not supported for ai_socktype What need to be done to resolve this ? Thnx ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev This mailing list is for development discussion only. For support, you should go to the general mailing list: https://wiki.openstack.org/wiki/Mailing_Lists#General_List -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Diagnostic] Diagnostic API: summit follow-up
On Wednesday, November 20, 2013 7:52:39 AM, Oleg Gelbukh wrote: Hi, fellow stackers, There was a conversation during 'Enhance debugability' session at the summit about Diagnostic API which allows gate to get 'state of world' of OpenStack installation. 'State of world' includes hardware- and operating system-level configurations of servers in cluster. This info would help to compare the expected effect of tests on a system with its actual state, thus providing Tempest with ability to see into it (whitebox tests) as one of possible use cases. Another use case is to provide input for validation of OpenStack configuration files. We're putting together an initial version of data model of API with example values in the following etherpad: https://etherpad.openstack.org/p/icehouse-diagnostic-api-spec This version covers most hardware and system-level configurations managed by OpenStack in Linux system. What is missing from there? What information you'd like to see in such an API? Please, feel free to share your thoughts in ML, or in the etherpad directly. -- Best regards, Oleg Gelbukh Mirantis Labs ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Hi Oleg, There has been some discussion over the nova virtapi's get_diagnostics method. The background is in a thread from October [1]. The timing is pertinent since the VMware team is working on implementing that API for their nova virt driver [2]. The main issue is there is no consistency between the nova virt drivers and how they would implement the get_diagnostics API, they only return information that is hypervisor-specific. The API docs and current Tempest test covers the libvirt driver's implementation, but wouldn't work for say xen, vmware or powervm drivers. I think the solution right now is to namespace the keys in the dict that is returned from the API so a caller could at least check for that and know how to handle processing the result, but it's not ideal. Does your solution take into account the nova virtapi's get_diagnostics method? [1] http://lists.openstack.org/pipermail/openstack-dev/2013-October/016385.html [2] https://review.openstack.org/#/c/51404/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] The recent gate performance and how it affects you
On Wednesday, November 20, 2013 2:44:52 PM, Clark Boylan wrote: Joe Gordon has been doing great working tracking test failures and how often they affect us. Post Havana release the failure rate has increased dramatically, negatively affecting the gate and forcing it to run in a near worst case scenario. That is changes are being tested in parallel but the head of the queue is more often than not running into a failed job forcing all changes behind it to be retested and so on. This led to a gate queue 130 deep with the head of the queue 18 hours behind its approval. We have identified fixes for some of the worst current bugs and in order to get them in have restarted Zuul effectively cancelling the gate queue and have queued these changes up at the front of the qeueue. Once these changes are in and we are happy with the bug fixing results we will requeue changes that were in the queue when it got cancelled. How do we avoid this in the future? Step one is reviewers that are approving changes (or reverifying them) should keep an eye on the gate queue. If it is struggling adding more changes to that queue problably won't help. Instead we should focus on identifying the bugs, submitting changes to elastic-recheck to track these bugs, and work towards fixing the bugs. Everyone is affected by persistent gate failures, we need to work together to fix them. Thank you for your patience, Clark ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Let me also say that I think it's really helpful that Joe has been sending out recaps to the mailing list about the top offenders so people can help pitch in on investigating and fixing those (like we saw with the Neutron team's response to Joe's recent post about the top gate failures). People get heads-down in their own projects and what they are working on and it's hard to keep up with what's going on in the infra channel (or nova channel for that matter), so sending out a recap that everyone can see in the mailing list is helpful to reset where things are at and focus possibly various isolated investigations (as we saw happen this week). -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Diagnostic] Diagnostic API: summit follow-up
On 11/20/2013 9:35 PM, Lingxian Kong wrote: hi Matt: noticed there is no consensus there[1], any progress outside the ML? [1] http://lists.openstack.org/__pipermail/openstack-dev/2013-__October/016385.html http://lists.openstack.org/pipermail/openstack-dev/2013-October/016385.html 2013/11/21 Oleg Gelbukh ogelb...@mirantis.com mailto:ogelb...@mirantis.com Matt, Thank you for bringing this up. I've been following this thread and the idea is somewhat aligned with our approach, but we'd like to take one step further. In this Diagnostic API, we want to collect information about system state from sources outside to OpenStack. We'd probably should extract this call from Nova API and use it in our implementation to get hypervisor-specific information about virtual machines which exist on the node. But the idea is to get vision into the system state alternative to that provided by OpenStack APIs. May be we should reconsider our naming to avoid confusion and call this Instrumentation API or something like that? -- Best regards, Oleg Gelbukh On Wed, Nov 20, 2013 at 6:45 PM, Matt Riedemann mrie...@linux.vnet.ibm.com mailto:mrie...@linux.vnet.ibm.com wrote: On Wednesday, November 20, 2013 7:52:39 AM, Oleg Gelbukh wrote: Hi, fellow stackers, There was a conversation during 'Enhance debugability' session at the summit about Diagnostic API which allows gate to get 'state of world' of OpenStack installation. 'State of world' includes hardware- and operating system-level configurations of servers in cluster. This info would help to compare the expected effect of tests on a system with its actual state, thus providing Tempest with ability to see into it (whitebox tests) as one of possible use cases. Another use case is to provide input for validation of OpenStack configuration files. We're putting together an initial version of data model of API with example values in the following etherpad: https://etherpad.openstack.__org/p/icehouse-diagnostic-api-__spec https://etherpad.openstack.org/p/icehouse-diagnostic-api-spec This version covers most hardware and system-level configurations managed by OpenStack in Linux system. What is missing from there? What information you'd like to see in such an API? Please, feel free to share your thoughts in ML, or in the etherpad directly. -- Best regards, Oleg Gelbukh Mirantis Labs _ OpenStack-dev mailing list OpenStack-dev@lists.openstack.__org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Hi Oleg, There has been some discussion over the nova virtapi's get_diagnostics method. The background is in a thread from October [1]. The timing is pertinent since the VMware team is working on implementing that API for their nova virt driver [2]. The main issue is there is no consistency between the nova virt drivers and how they would implement the get_diagnostics API, they only return information that is hypervisor-specific. The API docs and current Tempest test covers the libvirt driver's implementation, but wouldn't work for say xen, vmware or powervm drivers. I think the solution right now is to namespace the keys in the dict that is returned from the API so a caller could at least check for that and know how to handle processing the result, but it's not ideal. Does your solution take into account the nova virtapi's get_diagnostics method? [1] http://lists.openstack.org/__pipermail/openstack-dev/2013-__October/016385.html http://lists.openstack.org/pipermail/openstack-dev/2013-October/016385.html [2] https://review.openstack.org/#__/c/51404/ https://review.openstack.org/#/c/51404/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- *---* *Lingxian Kong* Huawei Technologies Co.,LTD. IT Product Line CloudOS PDU China, Xi'an Mobile: +86-18602962792 Email: konglingx...@huawei.com mailto:konglingx...@huawei.com
Re: [openstack-dev] Top Gate Bugs
+topic:57578,n,z but went far enough to revert the change that introduced that test. A couple people were going to keep hitting those changes to run them through more tests and see if 1251920 goes away. I don't quite understand why this test is problematic (Joe indicated it went in at about the time 1251920 became a problem). I would be very interested in finding out why this caused a problem. You can see frequencies for bugs with known signatures at http://status.openstack.org/elastic-recheck/ Hope this helps. Clark ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Joe is tracking some notes in an etherpad here: https://etherpad.openstack.org/p/critical-patches-gatecrash-November-2013 I've added https://review.openstack.org/#/c/57069/ and https://review.openstack.org/#/c/57042/ to the list. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Glance] Support of v1 and v2 glance APIs in Nova
client. But that seems better than having that code in nova. I know in Glance we've largely taken the view that the client should be as thin and lightweight as possible so users of the client can make use of it however they best see fit. There was an earlier patch that would have moved the whole image service layer into glanceclient that was rejected. So I think there is a division in philosophies here as well Hmm, I would be a fan of supporting both use cases, nova style and more complex. Just seems better for glance to own as much as possible of the glance client-like code. But I am a nova guy, I would say that! Anyway, that's a different conversation. John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I'm joining this thread a bit late but wanted to raise a few points for consideration. 1. It doesn't look like the 'use-glance-v2-api' blueprint [1] has gone anywhere since this thread seems to have hit a dead-end. 2. There is a blueprint [2] for nova supporting the cinder v2 API now too and the related review is actually defaulting to use v2, so given the history on this with the glance discussion, I think it's relevant to drop it into the same conversation. 3. As for the keystone service catalog being used to abstract some of this, there was a related blueprint [3] for abstracting the glance URI that nova would talk to. The blueprint was closed because I think Joe Gordon had something else cooking for enhancing the keystone service catalog, but there weren't any details put into the closed blueprint that Yang Yu opened. Where are we with that? I plan on bringing this up as a blueprint topic in today's nova meeting. [1] https://blueprints.launchpad.net/nova/+spec/use-glance-v2-api [2] https://blueprints.launchpad.net/nova/+spec/support-cinderclient-v2 [3] https://blueprints.launchpad.net/nova/+spec/nova-enable-glance-arbitrary-url -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Glance] Support of v1 and v2 glance APIs in Nova
On Thursday, November 21, 2013 9:56:41 AM, Matt Riedemann wrote: On 11/3/2013 5:22 AM, Joe Gordon wrote: On Nov 1, 2013 6:46 PM, John Garbutt j...@johngarbutt.com mailto:j...@johngarbutt.com wrote: On 29 October 2013 16:11, Eddie Sheffield eddie.sheffi...@rackspace.com mailto:eddie.sheffi...@rackspace.com wrote: John Garbutt j...@johngarbutt.com mailto:j...@johngarbutt.com said: Going back to Joe's comment: Can both of these cases be covered by configuring the keystone catalog? +1 If both v1 and v2 are present, pick v2, otherwise just pick what is in the catalogue. That seems cool. Not quite sure how the multiple glance endpoints works in the keystone catalog, but should work I assume. We hard code nova right now, and so we probably want to keep that route too? Nova doesn't use the catalog from Keystone when talking to Glance. There is a config value glance_api_servers which defines a list of Glance servers that gets randomized and cycled through. I assume that's what you're referring to with we hard code nova. But currently there's nowhere in this path (internal nova to glance) where the keystone catalog is available. Yes. I was not very clear. I am proposing we change that. We could try shoehorn the multiple glance nodes in the keystone catalog, then cache that in the context, but maybe that doesn't make sense. This is a separate change really. FYI: We cache the cinder endpoints from keystone catalog in the context already. So doing something like that with glance won't be without president. But clearly, we can't drop the direct configuration of glance servers for some time either. I think some of the confusion may be that Glanceclient at the programmatic client level doesn't talk to keystone. That happens happens higher in the CLI level which doesn't come into play here. From: Russell Bryant rbry...@redhat.com mailto:rbry...@redhat.com On 10/17/2013 03:12 PM, Eddie Sheffield wrote: Might I propose a compromise? 1) For the VERY short term, keep the config value and get the change otherwise reviewed and hopefully accepted. 2) Immediately file two blueprints: - python-glanceclient - expose a way to discover available versions - nova - depends on the glanceclient bp and allowing autodiscovery of glance version and making the config value optional (tho not deprecated / removed) Supporting both seems reasonable. At least then *most* people don't need to worry about it and it just works, but the override is there if necessary, since multiple people seem to be expressing a desire to have it available. +1 Can we just do this all at once? Adding this to glanceclient doesn't seem like a huge task. I worry about us never getting the full solution, but it seems to have got complicated. The glanceclient side is done, as far as allowing access to the list of available API versions on a given server. It's getting Nova to use this info that's a bit sticky. Hmm, OK. Could we not just cache the detected version, to reduce the impact of that decision. On 28 October 2013 15:13, Eddie Sheffield eddie.sheffi...@rackspace.com mailto:eddie.sheffi...@rackspace.com wrote: So...I've been working on this some more and hit a bit of a snag. The Glanceclient change was easy, but I see now that doing this in nova will require a pretty huge change in the way things work. Currently, the API version is grabbed from the config value, the appropriate driver is instantiated, and calls go through that. The problem comes in that the actually glance server isn't communicated with until very late in the process. Nothing sees the servers at the level where the driver is determined. Also there isn't a single glance server but a list of them, and in the even of certain communication failures the list is cycled through until success or a number of retries has passed. So to change this to auto configuring will require turning this upside down, cycling through the servers at a higher level, choosing the appropriate driver for that server, and handling retries at that same level. Doable, but a much larger task than I first was thinking. Also, I don't really want the added overhead of getting the api versions before every call, so I'm thinking that going through the list of servers at startup and discovering the versions then and caching that somehow would be helpful as well. Thoughts? I do worry about that overhead. But with Joe's comment, does it not just boil down to caching the keystone catalog in the context? I am not a fan of all the specific talk to glance code we have in nova, moving more of that into glanceclient can only be a good thing. For the XenServer itegration, for efficiency reasons, we need glance to talk from dom0, so it has dom0 making the final HTTP call
[openstack-dev] [nova] who wants to own docker bug triage?
Going through nova bug triage today I noticed a pretty straight-forward untagged bug for docker but then noticed we didn't have a docker tag in our bug tag table [1]. I went ahead and added one and the queries show a decent number of results, so people were already using the tag. The question is, who wants their name in the box next to that tag for owning triage? [1] https://wiki.openstack.org/wiki/Nova/BugTriage#Step_2:_Triage_Tagged_Bugs -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] who wants to own docker bug triage?
On Saturday, November 23, 2013 3:28:28 PM, Robert Collins wrote: Cool; also, if it's not, we should add that as an official tag so that it type-completes in LP. On 24 November 2013 10:21, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: Going through nova bug triage today I noticed a pretty straight-forward untagged bug for docker but then noticed we didn't have a docker tag in our bug tag table [1]. I went ahead and added one and the queries show a decent number of results, so people were already using the tag. The question is, who wants their name in the box next to that tag for owning triage? [1] https://wiki.openstack.org/wiki/Nova/BugTriage#Step_2:_Triage_Tagged_Bugs -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Good idea. I don't know how to do that though. Any guides I can follow to make that happen? -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Tempest] Drop python 2.6 support
On Monday, November 25, 2013 7:35:51 AM, Zhi Kun Liu wrote: Hi all, I saw that Tempest will drop python 2.6 support in design summit https://etherpad.openstack.org/p/icehouse-summit-qa-parallel. Drop tempest python 2.6 support:Remove all nose hacks in the code Delete nose, use unittest2 with testr/testtools and everything *should* just work (tm) Does that mean Tempest could not run on python 2.6 in the future? -- Regards, Zhi Kun Liu ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Well so if you're running a single-node setup of OpenStack on a VM on top of RHEL 6 and running Tempest from there, yeah, this is an inconvenience, but it's a pretty simple fix, right? I just run my OpenStack RHEL 6 VM and have an Ubuntu 12.04 or Fedora 19 or whatever distro-that-supports-py27 I want running Tempest against it. Am I missing something? FWIW, trying to keep up with the changes in Tempest when you're running on python 2.6 is no fun, especially with how tests are skipped (skipException causes a test failure if you don't have a special environment variable set). Plus you don't get parallel execution of the tests. So I agree with the approach even though it's going to hurt me in the short-term. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Hypervisor CI requirement and deprecation plan
On 11/15/2013 9:28 AM, Dan Smith wrote: Hi all, As you know, Nova adopted a plan to require CI testing for all our in-tree hypervisors by the Icehouse release. At the summit last week, we determined the actual plan for deprecating non-compliant drivers. I put together a page detailing the specific requirements we're putting in place as well as a plan and timeline for how the deprecation process will proceed: https://wiki.openstack.org/wiki/HypervisorSupportMatrix/DeprecationPlan I also listed the various drivers and whether we've heard any concrete plans from them. Driver owners should feel free to add details to that and correct any of the statements if incorrect. Thanks! --Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I'll play devil's advocate here and ask this question before someone else does. I'm assuming that the requirement of a 'full' tempest run means running this [1]. Is that correct? It's just confusing sometimes because there are other things in Tempest that aren't in the 'full' run, like stress tests. Assuming that's what 'full' means, it's running API, CLI, third party (boto), and scenario tests. Does it make sense to require a nova virt driver's CI to run API tests for keystone, heat and swift? Or couldn't the nova virt driver CI be scoped down to just the compute API tests? The argument against that is probably that the network/image/volume tests may create instances using nova to do their API testing also. The same would apply for the CLI tests since those are broken down by service, i.e. why would I need to run keystone and ceilometer CLI tests for a nova virt driver? If nothing else, I think we could firm up the wording on the wiki a bit around the requirements and what that means for scope. [1] https://github.com/openstack/tempest/blob/master/tox.ini#L33 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Hypervisor CI requirement and deprecation plan
On Monday, November 25, 2013 4:37:29 PM, Russell Bryant wrote: On 11/25/2013 05:19 PM, Matt Riedemann wrote: I'll play devil's advocate here and ask this question before someone else does. I'm assuming that the requirement of a 'full' tempest run means running this [1]. Is that correct? It's just confusing sometimes because there are other things in Tempest that aren't in the 'full' run, like stress tests. Assuming that's what 'full' means, it's running API, CLI, third party (boto), and scenario tests. Does it make sense to require a nova virt driver's CI to run API tests for keystone, heat and swift? Or couldn't the nova virt driver CI be scoped down to just the compute API tests? The argument against that is probably that the network/image/volume tests may create instances using nova to do their API testing also. The same would apply for the CLI tests since those are broken down by service, i.e. why would I need to run keystone and ceilometer CLI tests for a nova virt driver? If nothing else, I think we could firm up the wording on the wiki a bit around the requirements and what that means for scope. [1] https://github.com/openstack/tempest/blob/master/tox.ini#L33 I think the short answer is, whatever we're running against all Nova changes in the gate. Maybe a silly question, but is what is run against the check queue any different from the gate queue? I expect that for some drivers, a more specific configuration is going to be needed to exclude tests for features not implemented in that driver. That's fine. Soon we also need to start solidifying criteria for what features *must* be implemented in a driver. I think we've let some drivers in with far too many features not supported. That's a separate issue from the CI requirement, though. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Hypervisor CI requirement and deprecation plan
On Tuesday, November 26, 2013 10:07:02 AM, Sean Dague wrote: On 11/26/2013 09:56 AM, Russell Bryant wrote: On 11/26/2013 09:38 AM, Bob Ball wrote: -Original Message- From: Russell Bryant [mailto:rbry...@redhat.com] Sent: 26 November 2013 13:56 To: openstack-dev@lists.openstack.org Cc: Sean Dague Subject: Re: [openstack-dev] [Nova] Hypervisor CI requirement and deprecation plan On 11/26/2013 04:48 AM, Bob Ball wrote: I hope we can safely say that we should run against all gating tests which require Nova? Currently we run quite a number of tests in the gate that succeed even when Nova is not running as the gate isn't just for Nova but for all projects. Would you like to come up with a more detailed proposal? What tests would you cut, and how much time does it save? I don't have a detailed proposal yet - but it's very possible that we'll want one in the coming weeks. In terms of the time saved, I noticed that a tempest smoke run with Nova absent took 400 seconds on one of my machines (a particularly slow one) - so I imagine that would translate to maybe a 300 second / 5 minute reduction in overall time. Total smoke took approximately 800 seconds on the same machine. I don't think the smoke tests are really relevant here. That's not related to Nova vs non-Nova tests, right? If the approach could be acceptable then yes, I'm happy to come up with a detailed set of tests that I would propose cutting. My primary hesitation with the approach is it would need Tempest reviewers to be aware of this extra type of test, and flag up if a test is added to the full tempest suite which should also be in the nova tempest suite. Right now I don't think it's acceptable. I was suggesting a more detailed proposal to help convince me. :-) So we already have the beginnings of service tags in Tempest, that would let you slice exactly like this. I don't think the infrastructure is fully complete yet, but the idea being that you could run the subset of tests that interact with compute or networking in any real way. Realize... that's not going to drop that many tests for something like compute, it's touched a lot. -Sean ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Good to know about the service tags, I think I remember being broken at some point after those tempest.conf.sample changes. :) My overall concern, and I think the other guys doing this for virt drivers will agree, is trying to scope down the exposure to unrelated failures. For example, if there is a bug in swift breaking the gate, it could start breaking the nova virt driver CI as well. When things get bad in the gate, it takes some monstrous effort to rally people across the projects to come together to unblock it (like what Joe Gordon was doing last week). I'm running Tempest internally about once per day when we rebase code with the community and that's to cover running with the PowerVM driver for nova, Storwize driver for cinder, OVS for neutron, with qpid and DB2. We're running almost a full run except for the third party boto tests and swift API tests. The thing is, when something fails, I have to figure out if it's environmental (infra), a problem with tempest (think instability with neutron in the gate), a configuration issue, or a code bug. That's a lot for one person to have to cover, even a small team. That's why at some points we just have to ignore/exclude tests that continuously fail but we can't figure out (think intermittent gate breaker bugs that are open for months). Now multiply this out across all the nova virt drivers, the neutron plugins and I'm assuming at some point the various glance backends and cinder drivers (haven't heard if they are planning on the same types of CI requirements yet). I think either we're going to have a lot of flaky/instable driver CI going on so the scores can't be trusted, or we're going to develop a lot of people that get really good at infra/QA (which would be a plus in the long-run, but maybe not what those teams set out to be). I don't have any good answers, I'm just trying to raise the issue since this is complicated. I think it's also hard for people that aren't forced to invest in infra/QA on a daily basis to understand and appreciate the amount of effort it takes just to keep the wheels spinning, so I want to keep expectations at a reasonable level. Don't get me wrong, I absolutely agree with requiring third party CI for the various vendor-specific drivers and plugins, that's a no-brainer for openstack to scale. I think it will just be very interesting to see the kinds of results coming out of all of these disconnected teams come icehouse-3. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] Working with Vagrant and packstack
On Friday, November 29, 2013 2:16:23 AM, Peeyush Gupta wrote: Hi all, I have been trying to set up an openstack environment using vagrant and packstack. I provisioned a Fedora-19 VM through vagrant and used a shell script to take care of installation and other things. The first thing that shell script does is yum install -y openstack-packstack and then packstack --allinone. Now, the issue is that the second command requires me to enter the root's password explicitly. I mean it doesn't matter if I am running this as root or using sudo, I have to enter the password explicitly everytime. I tried to pass the password to the VM through pipes and other methods, but nothing works. Did anyone face the same problem? Is there any way around this? Or does it mean that I can't use puppet/packstack with vagrant? Thanks, ~Peeyush Gupta ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev This sounds like a support question so it should be posted to the general mailing list: https://wiki.openstack.org/wiki/Mailing_Lists#General_List The openstack-dev list is for development discussion topics. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Do we have some guidelines for mock, stub, mox when writing unit test?
on this soon IMHO, as this comes up with literally every commit. Cheers, Nikola [1] https://review.openstack.org/#/c/59694/ [2] https://pypi.python.org/pypi/mox [3] https://pypi.python.org/pypi/mox3/0.7.0 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Top Gate Bugs
about the need for having a new column in the Service table for indicating whether or not the service was automatically disabled, as Phil Day points out in bug 1250049 [6]. That way the ComputeFilter in the scheduler could handle that case a bit differently, at least from a logging/serviceability standpoint, e.g. info/warning level message vs debug. [1] https://bugs.launchpad.net/nova/+bug/1257644 [2] https://review.openstack.org/#/c/52189/ [3] https://review.openstack.org/#/c/56224/ [4] https://bugs.launchpad.net/nova/+bug/1254872 [5] http://www.redhat.com/archives/libvir-list/2012-July/msg01675.html [6] https://bugs.launchpad.net/nova/+bug/1250049 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [qa][keystone] Keystoneclient tests to tempest
On Sunday, December 08, 2013 11:26:07 AM, Brant Knudson wrote: We'd like to get the keystoneclient tests out of keystone. They're serving a useful purpose of catching problems with non-backwards compatible changes in keystoneclient so we still want them run. Problem is they're running at the wrong time -- only on changes to keystone and not changes to keystoneclient. The tests need to be run: When keystoneclient changes - run the tests against the change When the tests change - run the change against the current keystoneclient and also old clients When keystone changes - run the tests against the change with current client So here's what I think we need to do to get keystone client tests out of keystone: 1) Figure out where to put the tests - is it tempest or something else? 2) Write up a test and put it there 3) Have a job that when there's a change in the tests it runs against current client lib 4) Expand the job to also run against old clients - or is there 1 job per version? - what versions? (keystone does master, essex-3, and 0.1.1) - e.g. tox -e master,essex-3,0.1.1 - suggest start with these versions and then consider what to use in future 5) Now we can start adding tests 6) Have a job that when there's a change in keystoneclient it runs these tests against the change 7) When there's a change in keystone, run these tests against the change 8) Copy the keystoneclient tests from keystone to the new location -- will require some changes 9) Remove the tests from keystone \o/ 10) Move tests back to keystone where makes sense -- use webtest like v3 tests I created an etherpad with this same info so it's easier to discuss: https://etherpad.openstack.org/p/KeystoneTestsToTempest - Brant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I'll ask the super obvious question, why not move the keystoneclient tests to keystoneclient? -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][QA] Disabling the v3 API tests in the gate
On 6/12/2014 12:40 AM, Christopher Yeoh wrote: On Thu, Jun 12, 2014 at 7:30 AM, Matthew Treinish mtrein...@kortar.org mailto:mtrein...@kortar.org wrote: Hi everyone, As part of debugging all the bugs that have been plaguing the gate the past couple of weeks one of the things that came up is that we're still running the v3 API tests in the gate. AIUI at summit Nova decided that the v3 API test won't exist as a separate major version. So I'm not sure there is much value in continuing to run the API tests. So the v3 API won't exist as a separate major version, but I think its very important we keep up with the tempest tests so we don't regress. Over time these v3 api features will either be ported to v2.1microversions (the vast majority I expect) or dropped. At that point they'll be moved to tempest testing v2.1microversions. But whatever we do we'll need to test against v2 (which we're stuck with for a very long time) and v2.1microversions (rolling possible backwards incompatible changes to the v2 api) for quite a while. in motivator for doing this is the total run time of tempest, the v3 tests add ~7-10min of time to the gating jobs right now. [1] (which is just a time test, not how it'll be implemented) While this doesn't seem like much it actually would make a big difference in our total throughput. Every little bit counts. There are probably some other less quantifiable benefits to removing the extra testing like for example slightly decreasing the load on nova in an already stressed environment like the gating nodes. So I'd like to propose that we disable running the v3 API tests in the gate. I was thinking we would keep the tests around in tree for as long as there was a v3 API in any supported nova branch, but instead of running them in the gate just have a nightly bit-rot job on the tests and also add it to the experimental queue. I'd really prefer we don't take this route, but its better than nothing. Incidentally the v3 tempest api tests have in the past found race conditions which did theoretically occur in the v2 api as well. Just the different architecture exposed them a bit better. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I think it'd be OK to move them to the experimental queue and a periodic nightly job until the v2.1 stuff shakes out. The v3 API is marked experimental right now so it seems fitting that it'd be running tests in the experimental queue until at least the spec is approved and microversioning starts happening in the code base. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Gate still backed up - need assistance with nova-network logging enhancements
On 6/10/2014 5:36 AM, Michael Still wrote: https://review.openstack.org/99002 adds more logging to nova/network/manager.py, but I think you're not going to love the debug log level. Was this the sort of thing you were looking for though? Michael On Mon, Jun 9, 2014 at 11:45 PM, Sean Dague s...@dague.net wrote: Based on some back of envelope math the gate is basically processing 2 changes an hour, failing one of them. So if you want to know how long the gate is, take the length / 2 in hours. Right now we're doing a lot of revert roulette, trying to revert things that we think landed about the time things went bad. I call this roulette because in many cases the actual issue isn't well understood. A key reason for this is: *nova network is a blackhole* There is no work unit logging in nova-network, and no attempted verification that the commands it ran did a thing. Most of these failures that we don't have good understanding of are the network not working under nova-network. So we could *really* use a volunteer or two to prioritize getting that into nova-network. Without it we might manage to turn down the failure rate by reverting things (or we might not) but we won't really know why, and we'll likely be here again soon. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I mentioned this in the nova meeting today also but the assocated bug for the nova-network ssh timeout issue is bug 1298472 [1]. My latest theory on that one is if there could be a race/network leak in the ec2 third party tests in Tempest or something in the ec2 API in nova, because I saw this [2] showing up in the n-net logs. My thinking is the tests or the API are not tearing down cleanly and eventually network resources are leaked and we start hitting those timeouts. Just a theory at this point, but the ec2 3rd party tests do run concurrently with the scenario tests so things could be colliding at that point, but I haven't had time to dig into it, plus I have very little experience in those tests or the ec2 API in nova. [1] https://bugs.launchpad.net/tempest/+bug/1298472 [2] http://goo.gl/6f1dfw -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Message level security plans.
On 6/12/2014 10:08 AM, Kelsey, Timothy Joh wrote: Hello OpenStack folks, First please allow me to introduce myself, my name is Tim Kelsey and I’m a security developer working at HP. I am very interested in projects like Kite and the work that’s being undertaken to introduce message level security into OpenStack and would love to help out on that front. In an effort to ascertain the current state of development it would be great to hear from the people who are involved in this and find out what's being worked on or planned in blueprints. Many Thanks, -- Tim Kelsey Cloud Security Engineer HP Helion ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Are you talking about log messages or RPC messages? For log messages, there is a thread that started yesterday on masking auth tokens [1]. If RPC, I'm aware of at least one issue filed against Qpid [2] for allowing a way to tell Qpid not to log a message since it might contain sensitive information (like auth tokens). Looks like there is also an older blueprint for trusted messaging here [3]. [1] http://lists.openstack.org/pipermail/openstack-dev/2014-June/037345.html [2] https://issues.apache.org/jira/browse/QPID-5772 [3] https://blueprints.launchpad.net/oslo.messaging/+spec/trusted-messaging -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Gate proposal - drop Postgresql configurations in the gate
On 6/12/2014 9:38 AM, Mike Bayer wrote: On 6/12/14, 8:26 AM, Julien Danjou wrote: On Thu, Jun 12 2014, Sean Dague wrote: That's not cacthable in unit or functional tests? Not in an accurate manner, no. Keeping jobs alive based on the theory that they might one day be useful is something we just don't have the liberty to do any more. We've not seen an idle node in zuul in 2 days... and we're only at j-1. j-3 will be at least +50% of this load. Sure, I'm not saying we don't have a problem. I'm just saying it's not a good solution to fix that problem IMHO. Just my 2c without having a full understanding of all of OpenStack's CI environment, Postgresql is definitely different enough that MySQL strict mode could still allow issues to slip through quite easily, and also as far as capacity issues, this might be longer term but I'm hoping to get database-related tests to be lots faster if we can move to a model that spends much less time creating databases and schemas. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Is there some organization out there that uses PostgreSQL in production that could stand up 3rd party CI with it? I know that at least for the DB2 support we're adding across the projects we're doing 3rd party CI for that. Granted it's a proprietary DB unlike PG but if we're talking about spending resources on testing for something that's not widely used, but there is a niche set of users that rely on it, we could/should move that to 3rd party CI. I'd much rather see us spend our test resources on getting multi-node testing running in the gate so we can test migrations in Nova. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Olso] Periodic task coalescing
On 6/12/2014 8:55 AM, Tom Cammann wrote: Hello, I'm addressing https://bugs.launchpad.net/oslo/+bug/1326020 which is dealing with periodic tasks. There is currently a code block that checks if a task is 0.2 seconds away from being run and if so it run now instead. Essentially coalescing nearby tasks together. From oslo-incubator/openstack/common/periodic_task.py:162 # If a periodic task is _nearly_ due, then we'll run it early idle_for = min(idle_for, spacing) if last_run is not None: delta = last_run + spacing - time.time() if delta 0.2: idle_for = min(idle_for, delta) continue However the resolution in the config for various periodic tasks is by the second, and I have been unable to find a task that has a millisecond resolution. I intend to get rid of this coalescing in this bug fix. It fits in with this bug fix as I intend to make the tasks run on their specific spacing boundaries, i.e. if spacing is 10 seconds, it will run at 17:30:10, 17:30:20, etc. Is there any reason to keep the coalescing of tasks? Thanks, Tom ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Seems reasonable to remove this. For historical context, it looks like this code was moved over to oslo-incubator from nova in early Havana [1]. Going back to grizzly-eol on nova, the periodic task code was in nova.manager. From what I can tell, the 0.2 check was added here [2]. There isn't really an explicit statement about why that was added in the commit message or the related bug though. Maybe it had something to do with the tests or the dynamic looping call that was added? You could see if Michael (mikal) remembers. [1] https://review.openstack.org/#/c/25885/ [2] https://review.openstack.org/#/c/18618/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][QA] Disabling the v3 API tests in the gate
On 6/12/2014 10:51 AM, Matthew Treinish wrote: On Fri, Jun 13, 2014 at 12:41:19AM +0930, Christopher Yeoh wrote: On Fri, Jun 13, 2014 at 12:25 AM, Dan Smith d...@danplanet.com wrote: I think it'd be OK to move them to the experimental queue and a periodic nightly job until the v2.1 stuff shakes out. The v3 API is marked experimental right now so it seems fitting that it'd be running tests in the experimental queue until at least the spec is approved and microversioning starts happening in the code base. I think this is reasonable. Continuing to run the full set of tests on every patch for something we never expect to see the light of day (in its current form) seems wasteful to me. Plus, we're going to (presumably) be ramping up tests on v2.1, which means to me that we'll need to clear out some capacity to make room for that. Thats true, though I was suggesting as v2.1microversions rolls out we drop the test out of v3 and move it to v2.1microversions testing, so there's no change in capacity required. That's why I wasn't proposing that we rip the tests out of the tree. I'm just trying to weigh the benefit of leaving them enabled on every run against the increased load they cause in an arguably overworked gate. Matt - how much of the time overhead is scenario tests? That's something that would have a lot less impact if moved to and experimental queue. Although the v3 api as a whole won't be officially exposed, the api tests test specific features fairly indepdently which are slated for v2.1microversions on a case by case basis and I don't want to see those regress. I guess my concern is how often the experimental queue results get really looked at and how hard/quick it is to revert when lots of stuff merges in a short period of time) The scenario tests tend to be the slower tests in tempest. I have to disagree that removing them would have lower impact. The scenario tests provide the best functional verification, which is part of the reason we always have failures in the gate on them. While it would make the gate faster the decrease in what were testing isn't worth it. Also, for reference I pulled the test run times that were greater than 10sec out of a recent gate run: http://paste.openstack.org/show/83827/ The experimental jobs aren't automatically run, they have to be manually triggered by leaving a 'check experimental' comment. So for changes that we want to test the v3 api on a comment would have to left. To prevent regression is why we'd also have the nightly job, which I think is a better compromise for the v3 tests while we wait to migrate them to the v2.1 microversion tests. Another, option is that we make the v3 job run only on the check queue and not on the gate. But the benefits of that are slightly more limited, because we'd still be holding up the check queue. -Matt Treinish ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Yeah the scenario tests need to stay, that's how we've exposed the two big ssh bugs in the last couple of weeks which are obvious issues at scale. I still think experimental/periodic is the way to go, not a hybrid of check-on/gate-off. If we want to explicitly test v3 API changes we can do that with 'recheck experimental'. Granted someone has to remember to run those, much like checking/rechecking 3rd party CI results. One issue I've had with the nightly periodic job is finding out where the results are in an easy to consume format. Is there something out there for that? I'm thinking specifically of things we've turned off in the gate before like multi-backend volume tests and allow_tenant_isolation=False. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Message level security plans.
On 6/12/2014 10:31 AM, Kelsey, Timothy Joh wrote: Thanks for the info Matt, I guess I should have been clearer about what I was asking. I was indeed referring to the trusted RPC messaging proposal you linked. Im keen to find out whats happening with that and where I can help. Looks like there was a short related thread in the dev list last month: http://lists.openstack.org/pipermail/openstack-dev/2014-May/034392.html -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Gate still backed up - need assistance with nova-network logging enhancements
On 6/12/2014 10:41 AM, Davanum Srinivas wrote: Hey Matt, There is a connection pool in https://github.com/boto/boto/blob/develop/boto/connection.py which could be causing issues... -- dims On Thu, Jun 12, 2014 at 10:50 AM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 6/10/2014 5:36 AM, Michael Still wrote: https://review.openstack.org/99002 adds more logging to nova/network/manager.py, but I think you're not going to love the debug log level. Was this the sort of thing you were looking for though? Michael On Mon, Jun 9, 2014 at 11:45 PM, Sean Dague s...@dague.net wrote: Based on some back of envelope math the gate is basically processing 2 changes an hour, failing one of them. So if you want to know how long the gate is, take the length / 2 in hours. Right now we're doing a lot of revert roulette, trying to revert things that we think landed about the time things went bad. I call this roulette because in many cases the actual issue isn't well understood. A key reason for this is: *nova network is a blackhole* There is no work unit logging in nova-network, and no attempted verification that the commands it ran did a thing. Most of these failures that we don't have good understanding of are the network not working under nova-network. So we could *really* use a volunteer or two to prioritize getting that into nova-network. Without it we might manage to turn down the failure rate by reverting things (or we might not) but we won't really know why, and we'll likely be here again soon. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I mentioned this in the nova meeting today also but the assocated bug for the nova-network ssh timeout issue is bug 1298472 [1]. My latest theory on that one is if there could be a race/network leak in the ec2 third party tests in Tempest or something in the ec2 API in nova, because I saw this [2] showing up in the n-net logs. My thinking is the tests or the API are not tearing down cleanly and eventually network resources are leaked and we start hitting those timeouts. Just a theory at this point, but the ec2 3rd party tests do run concurrently with the scenario tests so things could be colliding at that point, but I haven't had time to dig into it, plus I have very little experience in those tests or the ec2 API in nova. [1] https://bugs.launchpad.net/tempest/+bug/1298472 [2] http://goo.gl/6f1dfw -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev mtreinish also pointed out that the nightly periodic job to run tempest with nova-network and without tenant isolation is failing from hitting over quotas on floating IPs [1]. That's also hitting security group rule failures [2], possibly those are related. [1] http://logs.openstack.org/periodic-qa/periodic-tempest-dsvm-full-non-isolated-master/b92b844/console.html#_2014-06-12_08_02_55_875 [2] http://logs.openstack.org/periodic-qa/periodic-tempest-dsvm-full-non-isolated-master/b92b844/console.html#_2014-06-12_08_02_56_623 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Review guidelines for API patches
On 6/12/2014 5:58 PM, Christopher Yeoh wrote: On Fri, Jun 13, 2014 at 8:06 AM, Michael Still mi...@stillhq.com mailto:mi...@stillhq.com wrote: In light of the recent excitement around quota classes and the floating ip pollster, I think we should have a conversation about the review guidelines we'd like to see for API changes proposed against nova. My initial proposal is: - API changes should have an associated spec +1 - API changes should not be merged until there is a tempest change to test them queued for review in the tempest repo +1 Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev We do have some API change guidelines here [1]. I don't want to go overboard on every change and require a spec if it's not necessary, i.e. if it falls into the 'generally ok' list in that wiki. But if it's something that's not documented as a supported API (so it's completely new) and is pervasive (going into novaclient so it can be used in some other service), then I think that warrants some spec consideration so we don't miss something. To compare, this [2] is an example of something that is updating an existing API but I don't think warrants a blueprint since I think it falls into the 'generally ok' section of the API change guidelines. [1] https://wiki.openstack.org/wiki/APIChangeGuidelines [2] https://review.openstack.org/#/c/99443/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Review guidelines for API patches
On 6/12/2014 8:58 PM, Matt Riedemann wrote: On 6/12/2014 5:58 PM, Christopher Yeoh wrote: On Fri, Jun 13, 2014 at 8:06 AM, Michael Still mi...@stillhq.com mailto:mi...@stillhq.com wrote: In light of the recent excitement around quota classes and the floating ip pollster, I think we should have a conversation about the review guidelines we'd like to see for API changes proposed against nova. My initial proposal is: - API changes should have an associated spec +1 - API changes should not be merged until there is a tempest change to test them queued for review in the tempest repo +1 Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev We do have some API change guidelines here [1]. I don't want to go overboard on every change and require a spec if it's not necessary, i.e. if it falls into the 'generally ok' list in that wiki. But if it's something that's not documented as a supported API (so it's completely new) and is pervasive (going into novaclient so it can be used in some other service), then I think that warrants some spec consideration so we don't miss something. To compare, this [2] is an example of something that is updating an existing API but I don't think warrants a blueprint since I think it falls into the 'generally ok' section of the API change guidelines. [1] https://wiki.openstack.org/wiki/APIChangeGuidelines [2] https://review.openstack.org/#/c/99443/ I think I'd like to say I think something about something a few more times... :) -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Gate proposal - drop Postgresql configurations in the gate
On 6/12/2014 5:11 PM, Michael Still wrote: On Thu, Jun 12, 2014 at 10:06 PM, Sean Dague s...@dague.net wrote: We're definitely deep into capacity issues, so it's going to be time to start making tougher decisions about things we decide aren't different enough to bother testing on every commit. I think one of the criticisms that could be made about OpenStack at the moment is that we're not opinionated enough. We have a lot of bugs because we support huge numbers of drivers of varying quality and completeness. Do you think its time for the gate to be an opinionated set of tests of how OpenStack can be deployed? Perhaps we should gate on only one permutation of a possible OpenStack cloud, and then let people who want to propose deviations from that permutation run their own CI as third parties. I'm not particularly advocating this stance, but it is an option and I'd like to see it explored a bit more. Michael Yeah was sort of thinking along the same lines - does any of the survey data help here, i.e. what's the percentage of deployments using mysql vs postgresql? Another example is we want testing for Ceph/Rbd but I don't expect that to be in the upstream CI/gate, I more or less expect that from some 3rd party CI run by someone using it in production and really really cares about it's quality and maintenance in the tree. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][ceilometer] FloatingIp pollster spamming n-api logs (bug 1328694)
On 6/12/2014 10:31 AM, John Garbutt wrote: On 11 June 2014 20:07, Joe Gordon joe.gord...@gmail.com wrote: On Wed, Jun 11, 2014 at 11:38 AM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 6/11/2014 10:01 AM, Eoghan Glynn wrote: Thanks for bringing this to the list Matt, comments inline ... tl;dr: some pervasive changes were made to nova to enable polling in ceilometer which broke some things and in my opinion shouldn't have been merged as a bug fix but rather should have been a blueprint. === The detailed version: I opened bug 1328694 [1] yesterday and found that came back to some changes made in ceilometer for bug 1262124 [2]. Upon further inspection, the original ceilometer bug 1262124 made some changes to the nova os-floating-ips API extension and the database API [3], and changes to python-novaclient [4] to enable ceilometer to use the new API changes (basically pass --all-tenants when listing floating IPs). The original nova change introduced bug 1328694 which spams the nova-api logs due to the ceilometer change [5] which does the polling, and right now in the gate ceilometer is polling every 15 seconds. IIUC that polling cadence in the gate is in the process of being reverted to the out-of-the-box default of 600s. I pushed a revert in ceilometer to fix the spam bug and a separate patch was pushed to nova to fix the problem in the network API. Thank you for that. The revert is just now approved on the ceilometer side, and is wending its merry way through the gate. The bigger problem I see here is that these changes were all made under the guise of a bug when I think this is actually a blueprint. We have changes to the nova API, changes to the nova database API, CLI changes, potential performance impacts (ceilometer can be hitting the nova database a lot when polling here), security impacts (ceilometer needs admin access to the nova API to list floating IPs for all tenants), documentation impacts (the API and CLI changes are not documented), etc. So right now we're left with, in my mind, two questions: 1. Do we just fix the spam bug 1328694 and move on, or 2. Do we revert the nova API/CLI changes and require this goes through the nova-spec blueprint review process, which should have happened in the first place. So just to repeat the points I made on the unlogged #os-nova IRC channel earlier, for posterity here ... Nova already exposed an all_tenants flag in multiple APIs (servers, volumes, security-groups etc.) and these would have: (a) generally pre-existed ceilometer's usage of the corresponding APIs and: (b) been tracked and proposed at the time via straight-forward LP bugs, as opposed to being considered blueprint material So the manner of the addition of the all_tenants flag to the floating_ips API looks like it just followed existing custom practice. Though that said, the blueprint process and in particular the nova-specs aspect, has been tightened up since then. My preference would be to fix the issue in the underlying API, but to use this as a teachable moment ... i.e. to require more oversight (in the form of a reviewed approved BP spec) when such API changes are proposed in the future. Cheers, Eoghan Are there other concerns here? If there are no major objections to the code that's already merged, then #2 might be excessive but we'd still need docs changes. I've already put this on the nova meeting agenda for tomorrow. [1] https://bugs.launchpad.net/ceilometer/+bug/1328694 [2] https://bugs.launchpad.net/nova/+bug/1262124 [3] https://review.openstack.org/#/c/81429/ [4] https://review.openstack.org/#/c/83660/ [5] https://review.openstack.org/#/c/83676/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev While there is precedent for --all-tenants with some of the other APIs, I'm concerned about where this stops. When ceilometer wants polling on some other resources that the nova API exposes, will it need the same thing? Doing all of this polling for resources in all tenants in nova puts an undue burden on the nova API and the database. Can we do something with notifications here instead? That's where the nova-spec process would have probably caught this. ++ to notifications and not polling. Yeah, I think we need to revert this, and go through the specs process. Its been released in Juno-1 now, so this revert feels bad, but perhaps its the best of a bad situation? Word of caution, we need to get notifications versioned correctly if we want this as a more formal external API. I think Heat have similar issues in this area, efficiently knowing about something happening in Nova. So we do need
Re: [openstack-dev] [Nova] Nominating Ken'ichi Ohmichi for nova-core
On 6/14/2014 5:40 AM, Sean Dague wrote: On 06/13/2014 06:40 PM, Michael Still wrote: Greetings, I would like to nominate Ken'ichi Ohmichi for the nova-core team. Ken'ichi has been involved with nova for a long time now. His reviews on API changes are excellent, and he's been part of the team that has driven the new API work we've seen in recent cycles forward. Ken'ichi has also been reviewing other parts of the code base, and I think his reviews are detailed and helpful. Please respond with +1s or any concerns. +1 References: https://review.openstack.org/#/q/owner:ken1ohmichi%2540gmail.com+status:open,n,z https://review.openstack.org/#/q/reviewer:ken1ohmichi%2540gmail.com,n,z http://www.stackalytics.com/?module=nova-groupuser_id=oomichi As a reminder, we use the voting process outlined at https://wiki.openstack.org/wiki/Nova/CoreTeam to add members to our core team. Thanks, Michael ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev +1 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][qa] Do all turbo-hipster jobs fail in stable/havana?
On 6/16/2014 11:58 PM, Joshua Hesketh wrote: Hi there, Very sorry for the mishap. I manually enqueued our zuul to run tests on changes that turbo-hipster had recently missed and did not pay attention to the branch they were for. Turbo-Hipster doesn't run tests on stable or non-master branches so it should have never attempted to. Because I enqueued the changes manually it accidentally attempted to run them and didn't know how to handle it correctly. I have removed the negative votes. Please let me know if I have missed any. Sorry again for the trouble. Cheers, Josh On 6/17/14 11:44 AM, wu jiang wrote: Hi all, Is turbo-hipster OK for stable/havana? I found all turbo-hipster jobs after 06/09 failed in stable/havana [1]. And the 'recheck migrations' command didn't trigger the re-examination of turbo-hipster, but Jenkins recheck work.. Thanks. WingWJ --- [1] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/havana,n,z [2] https://review.openstack.org/#/c/67613/ [3] https://review.openstack.org/#/c/72521/ [4] https://review.openstack.org/#/c/98874/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Yeah I have some on stable/icehouse with -1 votes from t-h: https://review.openstack.org/#/c/99215/ https://review.openstack.org/#/c/97811/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova default quotas
On 6/10/2014 3:56 PM, Matt Riedemann wrote: On 6/4/2014 11:02 AM, Day, Phil wrote: Matt and I chatted on IRC and have come up with an outlined plan, if we missed anything please don't hesitate to comment or ask. https://etherpad.openstack.org/p/quota-classes-goof-up I added a few thoughts / questions *From:*Joe Gordon [mailto:joe.gord...@gmail.com] *Sent:* 02 June 2014 21:52 *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [nova] nova default quotas On Mon, Jun 2, 2014 at 12:29 PM, Matt Riedemann mrie...@linux.vnet.ibm.com mailto:mrie...@linux.vnet.ibm.com wrote: On 6/2/2014 12:53 PM, Joe Gordon wrote: On Thu, May 29, 2014 at 10:46 AM, Matt Riedemann mrie...@linux.vnet.ibm.com mailto:mrie...@linux.vnet.ibm.com mailto:mrie...@linux.vnet.ibm.com mailto:mrie...@linux.vnet.ibm.com wrote: On 5/27/2014 4:44 PM, Vishvananda Ishaya wrote: I’m not sure that this is the right approach. We really have to add the old extension back for compatibility, so it might be best to simply keep that extension instead of adding a new way to do it. Vish On May 27, 2014, at 1:31 PM, Cazzolato, Sergio J sergio.j.cazzol...@intel.com mailto:sergio.j.cazzol...@intel.com mailto:sergio.j.cazzol...@intel.com mailto:sergio.j.cazzol...@intel.com wrote: I have created a blueprint to add this functionality to nova. https://review.openstack.org/#__/c/94519/ https://review.openstack.org/#/c/94519/ -Original Message- From: Vishvananda Ishaya [mailto:vishvana...@gmail.com mailto:vishvana...@gmail.com mailto:vishvana...@gmail.com mailto:vishvana...@gmail.com] Sent: Tuesday, May 27, 2014 5:11 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [nova] nova default quotas Phil, You are correct and this seems to be an error. I don't think in the earlier ML thread[1] that anyone remembered that the quota classes were being used for default quotas. IMO we need to revert this removal as we (accidentally) removed a Havana feature with no notification to the community. I've reactivated a bug[2] and marked it critcal. Vish [1] http://lists.openstack.org/__pipermail/openstack-dev/2014-__February/027574.html http://lists.openstack.org/pipermail/openstack-dev/2014-February/027574.html [2] https://bugs.launchpad.net/__nova/+bug/1299517 https://bugs.launchpad.net/nova/+bug/1299517 On May 27, 2014, at 12:19 PM, Day, Phil philip@hp.com mailto:philip@hp.com mailto:philip@hp.com mailto:philip@hp.com wrote: Hi Vish, I think quota classes have been removed from Nova now. Phil Sent from Samsung Mobile Original message From: Vishvananda Ishaya Date:27/05/2014 19:24 (GMT+00:00) To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [nova] nova default quotas Are you aware that there is already a way to do this through the cli using quota-class-update? http://docs.openstack.org/__user-guide-admin/content/cli___set_quotas.html http://docs.openstack.org/user-guide-admin/content/cli_set_quotas.html (near the bottom) Are you suggesting that we also add the ability to use just regular quota-update? I'm not sure i see the need for both. Vish On May 20, 2014, at 9:52 AM, Cazzolato, Sergio J sergio.j.cazzol...@intel.com mailto:sergio.j.cazzol...@intel.com mailto:sergio.j.cazzol...@intel.com mailto:sergio.j.cazzol...@intel.com wrote: I would to hear your thoughts about an idea to add a way to manage the default quota values through the API
[openstack-dev] [neutron][nova] nova needs a new release of neutronclient for OverQuotaClient exception
There are at least two changes [1][2] proposed to Nova that use the new OverQuotaClient exception in python-neutronclient, but the unit test jobs no longer test against trunk-level code of the client packages so they fail. So I'm here to lobby for a new release of python-neutronclient if possible so we can keep these fixes moving. Are there any issues with that? [1] https://review.openstack.org/#/c/62581/ [2] https://review.openstack.org/#/c/101462/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [qa][infra] etherpad on elastic-recheck testing improvements
Sean asked me to jot some thoughts down on how we can automate some of our common review criteria for elastic-recheck queries, so that's here: https://etherpad.openstack.org/p/elastic-recheck-testing There is some low-hanging-fruit in there I think, but the bigger / more useful change is actually automating running the proposed query against ES and validating the results within some defined criteria. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] milestone-proposed is dead, long lives proposed/foo
On 6/27/2014 7:44 AM, Thierry Carrez wrote: Hi everyone, Since the dawn of time, we have been using milestone-proposed branches for milestone and final release branches. Those would get milestone-critical and release-critical bugfixes backports, while the master branch can continue to be open for development. However, reusing the same blanket name for every such branch is causing various issues, especially around upgrade testing. It also creates havoc in local repositories which may have kept traces of previous incarnations of milestone-proposed. For all those reasons, we decided at the last summit to use unique pre-release branches, named after the series (for example, proposed/juno). That branch finally becomes stable/juno at release time. In parallel, we abandoned the usage of release branches for development milestones, which are now tagged directly on the master development branch. The visible impact of this change will be apparent when we reach Juno RC1s. RC bugfixes will have to be backported to proposed/juno instead of milestone-proposed. Tarballs automatically generated from this branch will be named PROJECT-proposed-juno.tar.gz instead of PROJECT-milestone-proposed.tar.gz. All relevant process wiki pages will be adapted to match the new names in the coming weeks. We are also generally changing[1] ACLs which used to apply to milestone-proposed branches so that they now apply to proposed/* branches. If you're a stackforge or non-integrated project which made use of milestone-proposed branches, you should probably switch to using proposed/foo branches when that patch lands. [1] https://review.openstack.org/#/c/102822/ Regards, We've been using a similar concept internally, we call it havana-proposed, icehouse-proposed, etc, but it sounds like the same idea. We're supporting more than the last 2 stable releases and it's hard to tell when we need to quickly turn out a release candidate build (like for a security issue going back to Folsom or something), so it makes sense for us to try and avoid branch naming collisions. Besides that branch naming we basically follow the same release process as upstream based on the same wiki's, with some additional automation for tagging and cleanup after the release. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Log / error message format best practices standards
On 6/26/2014 1:54 PM, Jay Pipes wrote: On 06/26/2014 12:14 PM, boden wrote: We were recently having a discussion over here in trove regarding a standardized format to use for log and error messages - obviously consistency is ideal (within and across projects). As this discussion involves the broader dev community, bringing this topic to the list for feedback... I'm aware of the logging standards wiki[1], however this page does not describe in depth a standardized format to use for log / error messages. In particular w/r/t program values in messages: (a) For in-line program values, I've seen both single quoted and unquoted formatting used. e.g. single quote: LOG.info(The ID '%s' is not invalid. % (resource.id)) unquoted: LOG.info(The ID %s is not valid. % (resource.id)) No opinion on this one. (b) For program values appended to the message, I've seen various formats used. e.g. LOG.info(This path is invalid: %s % (obj.path)) LOG.info(This path is invalid %s % (obj.path)) LOG.info(This path is invalid - %s % (obj.path)) The first would be my preference (i.e. using a : to delineate the target of the log message) From a consistency perspective, it seems we should consider standardizing a best practice for such formatting. Possibly, though this is likely getting into the realm of femto-nits and bike-shedding. Ha, you read my mind, i.e. bike-shedding. There are a few wikis and devref docs on style guides in openstack including logging standards, I'd say make sure there is common sense in there and then leave the rest to the review team to police the logs in new changes - if it's ugly, change it with a patch. We don't need to boil the ocean to develop a set of standards/processes that are so heavy weight that people aren't going to follow anyway. This sounds exactly like the kind of thing I see a lot within the workings of my corporate overlord and it drives me crazy, so I'm a bit biased here. :) FWIW, Sean Dague has a draft logging standards spec for nova here: https://review.openstack.org/#/c/91446/ For in-line values (#a above) I find single quotes the most consumable as they are a clear indication the value came from code and moreover provide a clear set of delimiters around the value. However to date unquoted appears to be the most widely used. For appended values (#b above) I find a delimiter such as ':' most consumable as it provides a clear boundary between the message and value. Using ':' seems fairly common today, but you'll find other formatting throughout the code. If we wanted to squash this topic the high level steps are (approximately): - Determine and document message format. - Ensure the format is part of the dev process (coding + review). - Cross team work to address existing messages not following the format. Thoughts / comments? [1] https://wiki.openstack.org/wiki/LoggingStandards ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [trove] how to trigger a recheck of reddwarf CI?
The reddwarf 3rd party CI is failing on an oslo sync patch [1] but Jenkins is fine, I'm unable to find any wiki or guideline on how to recheck just the reddwarf CI, is that possible? [1] https://review.openstack.org/#/c/103232/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][i18n] why isn't the client translated?
I noticed that there is no locale directory or setup.cfg entry for babel, which surprises me. The v1_1 shell in python-novaclient has a lot of messages marked for translation using the _() function but the v3 shell doesn't, presumably because someone figured out we don't translate the client messages anyway. I'm just wondering why we don't translate the client? -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] how to unit test scripts outside of nova/nova?
As part of the enforce-unique-instance-uuid-in-db blueprint [1] I'm writing a script to scan the database and find any NULL instance_uuid records that will cause the new database migration to fail so that operators can run this before they run the migration, otherwise the migration blocks if these types of records are found. I have the script written [2], but wanted to also write unit tests for it. I guess I assumed the script would go under nova/tools/db like the schema_diff.py script, but I'm not sure how to unit test anything outside of the nova/nova tree. Nova's testr configuration is only discovering tests within nova/tests [3]. But I don't think I can put the unit tests under nova/tests and then import the module from nova/tools. So I'm a bit stuck. I could take the easy way out and just throw the script under nova/db/sqlalchemy/migrate_repo and put my unit tests under nova/tests/db/, and I'd also get pep8 checking with that, but that doesn't seem right - but I'm also possibly over-thinking this. Anyone else have any ideas? [1] https://blueprints.launchpad.net/nova/+spec/enforce-unique-instance-uuid-in-db [2] https://review.openstack.org/#/c/97946/ [3] http://git.openstack.org/cgit/openstack/nova/tree/.testr.conf#n5 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] how to unit test scripts outside of nova/nova?
On 7/1/2014 4:03 PM, Matthew Treinish wrote: On Tue, Jul 01, 2014 at 03:21:06PM -0500, Matt Riedemann wrote: As part of the enforce-unique-instance-uuid-in-db blueprint [1] I'm writing a script to scan the database and find any NULL instance_uuid records that will cause the new database migration to fail so that operators can run this before they run the migration, otherwise the migration blocks if these types of records are found. I have the script written [2], but wanted to also write unit tests for it. I guess I assumed the script would go under nova/tools/db like the schema_diff.py script, but I'm not sure how to unit test anything outside of the nova/nova tree. Nova's testr configuration is only discovering tests within nova/tests [3]. But I don't think I can put the unit tests under nova/tests and then import the module from nova/tools. So we hit a similar issue in tempest when we wanted to unit test some utility scripts in tempest/tools. Changing the discovery path to find tests outside of nova/tests is actually a pretty easy change[4], but I don't think that will solve the use case with tox. What happened when we tried to do this in tempest use case was that when the project was getting installed the tools dir wasn't included so when we ran with tox it couldn't find the files we were trying to test. The solution we came up there was to put the script under the tempest namespace and add unit tests in tempest/tests. (we also added an entry point for the script to expose it as a command when tempest was installed) So I'm a bit stuck. I could take the easy way out and just throw the script under nova/db/sqlalchemy/migrate_repo and put my unit tests under nova/tests/db/, and I'd also get pep8 checking with that, but that doesn't seem right - but I'm also possibly over-thinking this. Anyone else have any ideas? I think it really comes down to how you want to present the utility to the end users. To enable unit testing it, it's just easier to put it in the nova namespace. I couldn't come up with a good way to get around the install/namespace issue. (maybe someone else who is more knowledgeable here has a good way to get around this) So then you can symlink it to the tools dir or add an entry point (or bake it into nova-manage) to make it easy to find. I think the issue with putting it in nova/db/sqlalchemy/migrate_repo is that it's hard to find. [1] https://blueprints.launchpad.net/nova/+spec/enforce-unique-instance-uuid-in-db [2] https://review.openstack.org/#/c/97946/ [3] http://git.openstack.org/cgit/openstack/nova/tree/.testr.conf#n5 [4] http://git.openstack.org/cgit/openstack/tempest/tree/tempest/test_discover/test_discover.py -Matt Treinish ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Matt, Thanks for the help, I completely forgot about making the new script an entry point in setup.cfg, that's a good idea. Before I saw this I did move the script under nova/db/sqlalchemy/migrate_repo and moved the tests under nova/tests/db and have that all working now, so will probably just move forward with that rather than try to do some black magic with test discovery and getting the module imported. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Openstack and SQLAlchemy
On 7/2/2014 8:23 PM, Mike Bayer wrote: I've just added a new section to this wiki, MySQLdb + eventlet = sad, summarizing some discussions I've had in the past couple of days about the ongoing issue that MySQLdb and eventlet were not meant to be used together. This is a big one to solve as well (though I think it's pretty easy to solve). https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#MySQLdb_.2B_eventlet_.3D_sad On 6/30/14, 12:56 PM, Mike Bayer wrote: Hi all - For those who don't know me, I'm Mike Bayer, creator/maintainer of SQLAlchemy, Alembic migrations and Dogpile caching. In the past month I've become a full time Openstack developer working for Red Hat, given the task of carrying Openstack's database integration story forward. To that extent I am focused on the oslo.db project which going forward will serve as the basis for database patterns used by other Openstack applications. I've summarized what I've learned from the community over the past month in a wiki entry at: https://wiki.openstack.org/wiki/Openstack_and_SQLAlchemy The page also refers to an ORM performance proof of concept which you can see at https://github.com/zzzeek/nova_poc. The goal of this wiki page is to publish to the community what's come up for me so far, to get additional information and comments, and finally to help me narrow down the areas in which the community would most benefit by my contributions. I'd like to get a discussion going here, on the wiki, on IRC (where I am on freenode with the nickname zzzeek) with the goal of solidifying the blueprints, issues, and SQLAlchemy / Alembic features I'll be focusing on as well as recruiting contributors to help in all those areas. I would welcome contributors on the SQLAlchemy / Alembic projects directly as well, as we have many areas that are directly applicable to Openstack. I'd like to thank Red Hat and the Openstack community for welcoming me on board and I'm looking forward to digging in more deeply in the coming months! - mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Regarding the eventlet + mysql sadness, I remembered this [1] in the nova.db.api code. I'm not sure if that's just nova-specific right now, I'm a bit too lazy at the moment to check if it's in other projects, but I'm not seeing it in neutron, for example, and makes me wonder if it could help with the neutron db lock timeouts we see in the gate [2]. Don't let the bug status fool you, that thing is still showing up, or a variant of it is. There are at least 6 lock-related neutron bugs hitting the gate [3]. [1] https://review.openstack.org/59760 [2] https://bugs.launchpad.net/neutron/+bug/1283522 [3] http://status.openstack.org/elastic-recheck/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Openstack and SQLAlchemy
On 7/7/2014 3:28 PM, Jay Pipes wrote: On 07/07/2014 04:17 PM, Mike Bayer wrote: On 7/7/14, 3:57 PM, Matt Riedemann wrote: Regarding the eventlet + mysql sadness, I remembered this [1] in the nova.db.api code. I'm not sure if that's just nova-specific right now, I'm a bit too lazy at the moment to check if it's in other projects, but I'm not seeing it in neutron, for example, and makes me wonder if it could help with the neutron db lock timeouts we see in the gate [2]. Don't let the bug status fool you, that thing is still showing up, or a variant of it is. There are at least 6 lock-related neutron bugs hitting the gate [3]. [1] https://review.openstack.org/59760 [2] https://bugs.launchpad.net/neutron/+bug/1283522 [3] http://status.openstack.org/elastic-recheck/ yeah, tpool, correct me if I'm misunderstanding, we take some API code that is 90% fetching from the database, we have it all under eventlet, the purpose of which is, IO can be shoveled out to an arbitrary degree, e.g. 500 concurrent connections type of thing, but then we take all the IO (MySQL access) and put it into a thread pool anyway. Yep. It makes no sense to do that, IMO. The solution is to use a non-blocking MySQLdb library which will yield appropriately for evented solutions like gevent and eventlet. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Yeah, nevermind my comment, since it's not working without an eventlet patch, details in the nova bug here [1]. And it sounds like it's still not 100% with the patch. [1] https://bugs.launchpad.net/nova/+bug/1171601 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] new nasty gate bug 1338844 with nova-network races
I noticed the bug [1] today. Given the trend in logstash, it might be related to some fixes proposed to try and resolve the other big nova ssh timeout bug 1298472. It appears to only be in jobs using nova-network. [1] https://bugs.launchpad.net/nova/+bug/1338844 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] whatever happened to removing instance.locked in icehouse?
I came across this [1] today and noticed the note to remove instance.locked in favor of locked_by is still in master, so apparently not being removed in Icehouse. Is anyone aware of intentions to remove instance.locked, or we don't care, or other? If we don't care, maybe we should remove the note in the code. I found it and thought about this because the check_instance_lock decorator in nova.compute.api doesn't check the locked_by field [2] but I'm guessing it probably should... [1] https://review.openstack.org/#/c/38196/13/nova/objects/instance.py [2] http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/api.py?id=2014.2.b1#n184 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] new nasty gate bug 1338844 with nova-network races
On 7/7/2014 9:29 PM, Matt Riedemann wrote: I noticed the bug [1] today. Given the trend in logstash, it might be related to some fixes proposed to try and resolve the other big nova ssh timeout bug 1298472. It appears to only be in jobs using nova-network. [1] https://bugs.launchpad.net/nova/+bug/1338844 Looks like jogo got the fix here: https://review.openstack.org/#/c/105651/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [gate] concurrent workers are overwhelming postgresql in the gate - bug 1338841
Bug 1338841 [1] started showing up yesterday and I first noticed it on the change to set osapi_volume_workers equal to the number of CPUs available by default. Similar patches for trove (api/conductor workers) and glance (api/registry workers) have landed in the last week also, and nova has been running with multiple api/conductor workers by default since Icehouse. It looks like the cinder change tipped the default postgresql max_connections over and we started getting asynchronous connection failures in that job. [2] We can also note that the postgresql job is the only one that runs the nova api-metadata service, which has it's own workers. The VMs the jobs are running on have 8 VCPUs, so that's at least 88 workers between nova (3), cinder (1), glance (2), trove (2), neutron, heat and ceilometer. So osapi_volume_workers (8) + n-api-meta workers (8) seems to have tipped it over. The first attempt at a fix is to simply double the default max_connections value [3]. While looking up the postgresql configuration docs, I also read a bit on synchronous_commit=off and fsync=off, which sound like we might want to also think about using one of those in devstack runs since they are supposed to be more performant if you don't care about disaster recovery (which we don't in gate runs on VMs). Anyway, bumping max connections might fix the gate, I'm just sending this out to see if there are any postgresql experts out there with additional tips or insights on things we can tweak or look for, including whether or not it might be worthwhile to set synchronous_commit=off or fsync=off for gate runs. [1] https://bugs.launchpad.net/nova/+bug/1338841 [2] http://goo.gl/yRBDjQ [3] https://review.openstack.org/#/c/105854/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gate] concurrent workers are overwhelming postgresql in the gate - bug 1338841
On 7/9/2014 2:59 PM, Matt Riedemann wrote: Bug 1338841 [1] started showing up yesterday and I first noticed it on the change to set osapi_volume_workers equal to the number of CPUs available by default. Similar patches for trove (api/conductor workers) and glance (api/registry workers) have landed in the last week also, and nova has been running with multiple api/conductor workers by default since Icehouse. It looks like the cinder change tipped the default postgresql max_connections over and we started getting asynchronous connection failures in that job. [2] We can also note that the postgresql job is the only one that runs the nova api-metadata service, which has it's own workers. The VMs the jobs are running on have 8 VCPUs, so that's at least 88 workers between nova (3), cinder (1), glance (2), trove (2), neutron, heat and ceilometer. So osapi_volume_workers (8) + n-api-meta workers (8) seems to have tipped it over. The first attempt at a fix is to simply double the default max_connections value [3]. While looking up the postgresql configuration docs, I also read a bit on synchronous_commit=off and fsync=off, which sound like we might want to also think about using one of those in devstack runs since they are supposed to be more performant if you don't care about disaster recovery (which we don't in gate runs on VMs). Anyway, bumping max connections might fix the gate, I'm just sending this out to see if there are any postgresql experts out there with additional tips or insights on things we can tweak or look for, including whether or not it might be worthwhile to set synchronous_commit=off or fsync=off for gate runs. [1] https://bugs.launchpad.net/nova/+bug/1338841 [2] http://goo.gl/yRBDjQ [3] https://review.openstack.org/#/c/105854/ Typo in my math on the workers, it should be: nova (3*8), cinder (1*8), glance (2*8), trove (2*8), neutron (1), heat (1) and ceilometer (1) = 67. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Inter cloud resource federation [Alliance]
On 7/9/2014 12:33 PM, Tiwari, Arvind wrote: Hi All, I am investigating on inter cloud resource federation across OS based cloud deployments, this is needed to support multi regions, cloud bursting, VPC and more use cases. I came up with a design (link below) which advocate a new service (a.k.a. Alliance), this service sits close to Keystone and help abstracting all the inter cloud concerns from Keystone. This service will be abstracted from end users and there won’t be any direct interactions between user and Alliance service. Keystone will be delegating all inter cloud concerns to Alliance. https://wiki.openstack.org/wiki/Inter_Cloud_Resource_Federation Apart from basic resource federation use cases, Alliance service will add following features 1.UUID token support across cloud 2.PKI Token support 3.Inter Cloud Token Validation 4.Inter Cloud Communication to allow •Region/endpoint Discovery •Service Discovery •Remote Resource Provisioning 5.Resource Access Across Clouds 6.SSO Across Cloud 7.SSOut Across Cloud (or Inter Cloud Token Revocation) 8.Notification to propagate meter info, resource de-provisioning …. I would appreciate if you guys take a look and share your perspective. I am open to any questions, suggestions, discussions on the same. Thanks for your time, Arvind *Please excuse any typographical error.*** ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Is this only identity (keystone) are other things like booting instances in nova from public/private clouds which are abstracted from the client, and if so have you heard of nova-cells? -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] fastest way to run individual tests ?
On 6/12/2014 6:17 AM, Daniel P. Berrange wrote: On Thu, Jun 12, 2014 at 07:07:37AM -0400, Sean Dague wrote: On 06/12/2014 06:59 AM, Daniel P. Berrange wrote: Does anyone have any tip on how to actually run individual tests in an efficient manner. ie something that adds no more than 1 second penalty over above the time to run the test itself. NB, assume that i've primed the virtual env with all prerequisite deps already. The overhead is in the fact that we have to discover the world, then throw out the world. You can actually run an individual test via invoking the testtools.run directly: python -m testtools.run nova.tests.test_versions (Also, when testr explodes because of an import error this is about the only way to debug what's going on). Most excellent, thankyou. I knew someone must know a way to do it :-) Regards, Daniel I've been beating my head against the wall a bit on unit tests too this week, and here is another tip that just uncovered something for me when python -m testtools.run and nosetests didn't help. I sourced the tox virtualenv and then ran the test from there, which gave me the actual error, so something like this: source .tox/py27/bin/activate python -m testtools.run test Props to Matt Odden for helping me with the source of the venv tip. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][qa] proposal for moving forward on cells/tempest testing
Today we only gate on exercises in devstack for cells testing coverage in the gate-devstack-dsvm-cells job. The cells tempest non-voting job was moving to the experimental queue here [1] since it doesn't work with a lot of the compute API tests. I think we all agreed to tar and feather comstud if he didn't get Tempest working (read: passing) with cells enabled in Juno. The first part of this is just figuring out where we sit with what's failing in Tempest (in the check-tempest-dsvm-cells-full job). I'd like to propose that we do the following to get the ball rolling: 1. Add an option to tempest.conf under the compute-feature-enabled section to toggle cells and then use that option to skip tests that we know will fail in cells, e.g. security group tests. 2. Open bugs for all of the tests we're skipping so we can track closing those down, assuming they aren't already reported. [2] 3. Once the known failures are being skipped, we can move check-tempest-dsvm-cells-full out of the experimental queue. I'm not proposing that it'd be voting right away, I think we have to see it burn in for awhile first. With at least this plan we should be able to move forward on identifying issues and getting some idea for how much of Tempest doesn't work with cells and the effort involved in making it work. Thoughts? If there aren't any objections, I said I'd work on the qa-spec and can start doing the grunt-work of opening bugs and skipping tests. [1] https://review.openstack.org/#/c/87982/ [2] https://bugs.launchpad.net/nova/+bugs?field.tag=cells+ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [oslo][glance] what to do about tons of http connection pool is full warnings in g-api log?
I opened bug 1341777 [1] against glance but it looks like it's due to the default log level for requests.packages.urllib3.connectionpool in oslo's log module. The problem is this warning shows up nearly 420K times in 7 days in Tempest runs: WARNING urllib3.connectionpool [-] HttpConnectionPool is full, discarding connection: 127.0.0.1 So either glance is doing something wrong, or that's logging too high of a level (I think it should be debug in this case). I'm not really sure how to scope this down though, or figure out what is so damn chatty in glance-api that is causing this. It doesn't seem to be causing test failures, but the rate at which this is logged in glance-api is surprising. [1] https://bugs.launchpad.net/glance/+bug/1341777 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo][glance] what to do about tons of http connection pool is full warnings in g-api log?
On 7/14/2014 4:09 PM, Matt Riedemann wrote: I opened bug 1341777 [1] against glance but it looks like it's due to the default log level for requests.packages.urllib3.connectionpool in oslo's log module. The problem is this warning shows up nearly 420K times in 7 days in Tempest runs: WARNING urllib3.connectionpool [-] HttpConnectionPool is full, discarding connection: 127.0.0.1 So either glance is doing something wrong, or that's logging too high of a level (I think it should be debug in this case). I'm not really sure how to scope this down though, or figure out what is so damn chatty in glance-api that is causing this. It doesn't seem to be causing test failures, but the rate at which this is logged in glance-api is surprising. [1] https://bugs.launchpad.net/glance/+bug/1341777 I found this older thread [1] which led to this in oslo [2] but I'm not really sure how to use it to make the connectionpool logging quieter in glance, any guidance there? It looks like in Joe's change to nova for oslo.messaging he just changed the value directly in the log module in nova, something I thought was forbidden. [1] http://lists.openstack.org/pipermail/openstack-dev/2014-March/030763.html [2] https://review.openstack.org/#/c/94001/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo][glance] what to do about tons of http connection pool is full warnings in g-api log?
On 7/14/2014 5:18 PM, Ben Nemec wrote: On 07/14/2014 04:21 PM, Matt Riedemann wrote: On 7/14/2014 4:09 PM, Matt Riedemann wrote: I opened bug 1341777 [1] against glance but it looks like it's due to the default log level for requests.packages.urllib3.connectionpool in oslo's log module. The problem is this warning shows up nearly 420K times in 7 days in Tempest runs: WARNING urllib3.connectionpool [-] HttpConnectionPool is full, discarding connection: 127.0.0.1 So either glance is doing something wrong, or that's logging too high of a level (I think it should be debug in this case). I'm not really sure how to scope this down though, or figure out what is so damn chatty in glance-api that is causing this. It doesn't seem to be causing test failures, but the rate at which this is logged in glance-api is surprising. [1] https://bugs.launchpad.net/glance/+bug/1341777 I found this older thread [1] which led to this in oslo [2] but I'm not really sure how to use it to make the connectionpool logging quieter in glance, any guidance there? It looks like in Joe's change to nova for oslo.messaging he just changed the value directly in the log module in nova, something I thought was forbidden. [1] http://lists.openstack.org/pipermail/openstack-dev/2014-March/030763.html [2] https://review.openstack.org/#/c/94001/ There was a change recently in incubator to address something related, but since it's setting to WARN I don't think it would get rid of this message: https://github.com/openstack/oslo-incubator/commit/3310d8d2d3643da2fc249fdcad8f5000866c4389 It looks like Joe's change was a cherry-pick of the incubator change to add oslo.messaging, so discouraged but not forbidden (and apparently during feature freeze, which is understandable). -Ben ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Yeah it sounds like either a problem in glance because they don't allow configuring the max pool size so it defaults to 1, or it's an issue in python-swiftclient and is being tracked in a different bug: https://bugs.launchpad.net/python-swiftclient/+bug/1295812 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][qa] proposal for moving forward on cells/tempest testing
On 7/15/2014 12:36 AM, Sean Dague wrote: On 07/14/2014 07:44 PM, Matt Riedemann wrote: Today we only gate on exercises in devstack for cells testing coverage in the gate-devstack-dsvm-cells job. The cells tempest non-voting job was moving to the experimental queue here [1] since it doesn't work with a lot of the compute API tests. I think we all agreed to tar and feather comstud if he didn't get Tempest working (read: passing) with cells enabled in Juno. The first part of this is just figuring out where we sit with what's failing in Tempest (in the check-tempest-dsvm-cells-full job). I'd like to propose that we do the following to get the ball rolling: 1. Add an option to tempest.conf under the compute-feature-enabled section to toggle cells and then use that option to skip tests that we know will fail in cells, e.g. security group tests. I don't think we should do that. Part of creating the feature matrix in devstack gate included the follow on idea of doing extension selection based on branch or feature. I'm happy if that gets finished, then tests are skipped by known not working extensions, but just landing a ton of tempest ifdefs that will all be removed is feeling very gorpy. Especially as we're now at Juno 2, which was supposed to be the checkpoint for this being on track for completion and... people are just talking about starting. 2. Open bugs for all of the tests we're skipping so we can track closing those down, assuming they aren't already reported. [2] 3. Once the known failures are being skipped, we can move check-tempest-dsvm-cells-full out of the experimental queue. I'm not proposing that it'd be voting right away, I think we have to see it burn in for awhile first. With at least this plan we should be able to move forward on identifying issues and getting some idea for how much of Tempest doesn't work with cells and the effort involved in making it work. Thoughts? If there aren't any objections, I said I'd work on the qa-spec and can start doing the grunt-work of opening bugs and skipping tests. [1] https://review.openstack.org/#/c/87982/ [2] https://bugs.launchpad.net/nova/+bugs?field.tag=cells+ All the rest is fine, I just think we should work on the proper way to skip things. -Sean ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev OK I don't know anything about the extensions in devstack-gate or how the skips would work then, I'll have to bug some people in IRC unless there is an easy example that can be pointed out here. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo][glance] what to do about tons of http connection pool is full warnings in g-api log?
On 7/14/2014 5:28 PM, Matt Riedemann wrote: On 7/14/2014 5:18 PM, Ben Nemec wrote: On 07/14/2014 04:21 PM, Matt Riedemann wrote: On 7/14/2014 4:09 PM, Matt Riedemann wrote: I opened bug 1341777 [1] against glance but it looks like it's due to the default log level for requests.packages.urllib3.connectionpool in oslo's log module. The problem is this warning shows up nearly 420K times in 7 days in Tempest runs: WARNING urllib3.connectionpool [-] HttpConnectionPool is full, discarding connection: 127.0.0.1 So either glance is doing something wrong, or that's logging too high of a level (I think it should be debug in this case). I'm not really sure how to scope this down though, or figure out what is so damn chatty in glance-api that is causing this. It doesn't seem to be causing test failures, but the rate at which this is logged in glance-api is surprising. [1] https://bugs.launchpad.net/glance/+bug/1341777 I found this older thread [1] which led to this in oslo [2] but I'm not really sure how to use it to make the connectionpool logging quieter in glance, any guidance there? It looks like in Joe's change to nova for oslo.messaging he just changed the value directly in the log module in nova, something I thought was forbidden. [1] http://lists.openstack.org/pipermail/openstack-dev/2014-March/030763.html [2] https://review.openstack.org/#/c/94001/ There was a change recently in incubator to address something related, but since it's setting to WARN I don't think it would get rid of this message: https://github.com/openstack/oslo-incubator/commit/3310d8d2d3643da2fc249fdcad8f5000866c4389 It looks like Joe's change was a cherry-pick of the incubator change to add oslo.messaging, so discouraged but not forbidden (and apparently during feature freeze, which is understandable). -Ben ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Yeah it sounds like either a problem in glance because they don't allow configuring the max pool size so it defaults to 1, or it's an issue in python-swiftclient and is being tracked in a different bug: https://bugs.launchpad.net/python-swiftclient/+bug/1295812 It looks like the issue for the g-api logs was bug 1295812 in python-swiftclient, around the time that moved to using python-requests. I noticed last night that the n-cpu/c-vol logs started spiking with the urllib3 connectionpool warning on 7/11 which is when python-glanceclient started using requests, so I've changed bug 1341777 to a python-glanceclient bug. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] python-glanceclient with requests is spamming the logs
I've been looking at bug 1341777 since yesterday originally because of g-api logs and this warning: HttpConnectionPool is full, discarding connection: 127.0.0.1 But that's been around awhile and it sounds like an issue with python-swiftclient since it started using python-requests (see bug 1295812). I did also noticed that the warning started spiking in the n-cpu and c-vol logs on 7/11 and traced that back to this change in python-glanceclient to start using requests: https://review.openstack.org/#/c/78269/ This is nasty because it's generating around 166K warnings since 7/11 in those logs: http://goo.gl/p0urYm It's a big change in glanceclient so I wouldn't want to propose a revert for this, but hopefully the glance team can sort this out quickly since it's going to impact our elastic search cluster. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] request to tag novaclient 2.18.0
On 7/17/2014 5:48 PM, Steve Baker wrote: On 18/07/14 00:44, Joe Gordon wrote: On Wed, Jul 16, 2014 at 11:28 PM, Steve Baker sba...@redhat.com mailto:sba...@redhat.com wrote: On 12/07/14 09:25, Joe Gordon wrote: On Fri, Jul 11, 2014 at 4:42 AM, Jeremy Stanley fu...@yuggoth.org mailto:fu...@yuggoth.org wrote: On 2014-07-11 11:21:19 +0200 (+0200), Matthias Runge wrote: this broke horizon stable and master; heat stable is affected as well. [...] I guess this is a plea for applying something like the oslotest framework to client libraries so they get backward-compat jobs run against unit tests of all dependant/consuming software... branchless tempest already alleviates some of this, but not the case of changes in a library which will break unit/functional tests of another project. We actually do have some tests for backwards compatibility, and they all passed. Presumably because both heat and horizon have poor integration test. We ran * check-tempest-dsvm-full-havana http://logs.openstack.org/66/94166/3/check/check-tempest-dsvm-full-havana/8e09faa SUCCESS in 40m 47s (non-voting) * check-tempest-dsvm-neutron-havana http://logs.openstack.org/66/94166/3/check/check-tempest-dsvm-neutron-havana/b4ad019 SUCCESS in 36m 17s (non-voting) * check-tempest-dsvm-full-icehouse http://logs.openstack.org/66/94166/3/check/check-tempest-dsvm-full-icehouse/c0c62e5 SUCCESS in 53m 05s * check-tempest-dsvm-neutron-icehouse http://logs.openstack.org/66/94166/3/check/check-tempest-dsvm-neutron-icehouse/a54aedb SUCCESS in 57m 28s on the offending patches (https://review.openstack.org/#/c/94166/) Infra patch that added these tests: https://review.openstack.org/#/c/80698/ Heat-proper would have continued working fine with novaclient 2.18.0. The regression was with raising novaclient exceptions, which is only required in our unit tests. I saw this break coming and switched to raising via from_response https://review.openstack.org/#/c/97977/22/heat/tests/v1_1/fakes.py Unit tests tend to deal with more internals of client libraries just for mocking purposes, and there have been multiple breaks in unit tests for heat and horizon when client libraries make internal changes. This could be avoided if the client gate jobs run the unit tests for the projects which consume them. That may work but isn't this exactly what integration testing is for? If you mean tempest then no, this is different. Client projects have done a good job of keeping their public library APIs stable. An exception type is public API, but the constructor for raising that type arguably is more of a gray area since only the client library should be raising its own exceptions. However heat and horizon unit tests need to raise client exceptions to test their own error condition handling, so exception constructors could be considered public API, but only for unit test mocking in other projects. This problem couldn't have been caught in an integration test because nothing outside the unit tests directly raises a client exception. There have been other breakages where internal client library changes have broken the mocking in our unit tests (I recall a neutronclient internal refactor). In many cases the cause may be inappropriate mocking in the unit tests, but that is cold comfort when the gates break when a client library is released. Maybe we can just start with adding heat and horizon to the check jobs of the clients they consume, but the following should also be considered: grep python-.*client */requirements.txt This could give client libraries more confidence that internal changes don't break anything, and allows them to fix mocking in other projects before their changes land. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I don't think we should have to change the gate jobs just so that other projects can test against the internals of their dependent clients, that sounds like a flawed unit test design to me. Looking at https://review.openstack.org/#/c/97977/22/heat/tests/v1_1/fakes.py for example, why is a fake_exception needed to mock out novaclient's NotFound exception? A better way to do this is that whatever is expecting to raise the NotFound should use mock with a side_effect to raise novaclient.exceptions.NotFound, then mock handles the spec being set on the mock and you don't have to worry about the internal construction of the exception class in your unit tests. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [gate] Automatic elastic rechecks
On 7/17/2014 9:01 AM, Matthew Booth wrote: Elastic recheck is a great tool. It leaves me messages like this: === I noticed jenkins failed, I think you hit bug(s): check-devstack-dsvm-cells: https://bugs.launchpad.net/bugs/1334550 gate-tempest-dsvm-large-ops: https://bugs.launchpad.net/bugs/1334550 We don't automatically recheck or reverify, so please consider doing that manually if someone hasn't already. For a code review which is not yet approved, you can recheck by leaving a code review comment with just the text: recheck bug 1334550 For bug details see: http://status.openstack.org/elastic-recheck/ === In an ideal world, every person seeing this would diligently check that the fingerprint match was accurate before submitting a recheck request. In the real world, how about we just do it automatically? Matt We don't want automatic rechecks because then we're just piling on to races, because you can have jenkins failures where we have a fingerprint for one job failure but there is some other job failing on your patch which is an unrecognized failure (no e-r fingerprint query yet). If we never force people to investigate the failures and write fingerprints because we're just always automatically rechecking things for them, we'll drop our categorization rates and most likely eventually fall into a locked gate once we hit 2-3 really nasty races hitting at the same time. So the best way to avoid a locked gate is to stay on top of managing the worst offenders and making sure everyone is actually looking at what failed so we can quickly identify new races. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] About the ERROR:cliff.app Service Unavailable during deploy openstack by devstack.
On 7/14/2014 3:47 AM, Meng Jie MJ Li wrote: HI, I tried to use devstack to deploy openstack. But encountered an issue : ERROR: cliff.app Service Unavailable (HTTP 503). Tried several times all same result. 2014-07-14 05:53:39.430 | + create_keystone_accounts 2014-07-14 05:53:39.431 | ++ get_or_create_project admin 2014-07-14 05:53:39.433 | +++ openstack project show admin -f value -c id 2014-07-14 05:53:40.147 | +++ openstack project create admin -f value -c id 2014-07-14 05:53:40.771 | ERROR: cliff.app Service Unavailable (HTTP 503) 2014-07-14 05:53:41.519 | +++ openstack user create admin --password admin --project --email ad...@example.com -f value -c id 2014-07-14 05:53:42.080 | usage: openstack user create [-h] [-f {shell,table,value}] [-c COLUMN] 2014-07-14 05:53:42.080 | [--max-width integer] [--prefix PREFIX] 2014-07-14 05:53:42.080 | [--password user-password] [--password-prompt] 2014-07-14 05:53:42.080 | [--email user-email] [--project project] 2014-07-14 05:53:42.080 | [--enable | --disable] 2014-07-14 05:53:42.080 | user-name 2014-07-14 05:53:42.081 | openstack user create: error: argument --project: expected one argument 2014-07-14 05:53:42.109 | ++ USER_ID= 2014-07-14 05:53:42.109 | ++ echo 2014-07-14 05:53:42.109 | + ADMIN_USER= 2014-07-14 05:53:42.110 | ++ get_or_create_role admin 2014-07-14 05:53:42.111 | +++ openstack role show admin -f value -c id 2014-07-14 05:53:42.682 | +++ openstack role create admin -f value -c id 2014-07-14 05:53:43.235 | ERROR: cliff.app Service Unavailable (HTTP 503) By checked in google, found someone encountered the same problem logged in https://bugs.launchpad.net/devstack/+bug/129, I tried to workaround but didn't work. The below is workaround way. = 1st, I tried setting HOST_IP to 127.0.0.1. Next, I set it to *9.21.xxx.xxx* , which is the address of my eth0 interface, and added export no_proxy=localhost,127.0.0.1,*9.21.xxx.xxx* Neither of these fixed the problem. My localrc file: HOST_IP=9.21.xxx.xxx FLAT_INTERFACE=eth0 #FIXED_RANGE=10.4.128.0/20 #FIXED_NETWORK_SIZE=4096 #FLOATING_RANGE=192.168.42.128/25 MULTI_HOST=1 LOGFILE=/opt/stack/logs/stack.sh.log ADMIN_PASSWORD=admin MYSQL_PASSWORD=admin RABBIT_PASSWORD=admin SERVICE_PASSWORD=admin SERVICE_TOKEN=xyzpdqlazydog === Any help appreciated Regards Mengjie ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev There was a recent change to devstack to default to running keystone in apache, that might be what you're hitting. There is an env var to disable that so it doesn't run in apache, but you'd have to look up the change for the details. Should be in the devstack/libs/keystone file history. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [neutron] requesting python-neutronclient release for MacAddressInUseClient exception
Nove needs a python-neutronclient release to use the new MacAddressInUseClient exception type defined here [1]. [1] https://review.openstack.org/#/c/109052/ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev