Re: [openstack-dev] [Nova] [feature freeze exception] Move to oslo.db
I'm good with this one too, so that makes three if Joe is ok with this. @Josh -- can you please take a look at the TH failures? Thanks, Michael On Wed, Sep 3, 2014 at 8:10 PM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 9/3/2014 5:08 PM, Andrey Kurilin wrote: Hi All! I'd like to ask for a feature freeze exception for porting nova to use oslo.db. This change not only removes 3k LOC, but fixes 4 bugs(see commit message for more details) and provides relevant, stable common db code. Main maintainers of oslo.db(Roman Podoliaka and Victor Sergeyev) are OK with this. Joe Gordon and Matt Riedemann are already signing up, so we need one more vote from Core developer. By the way a lot of core projects are using already oslo.db for a while: keystone, cinder, glance, ceilometer, ironic, heat, neutron and sahara. So migration to oslo.db won’t produce any unexpected issues. Patch is here: https://review.openstack.org/#/c/101901/ -- Best regards, Andrey Kurilin. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Just re-iterating my agreement to sponsor this. I'm waiting for the latest patch set to pass Jenkins and for Roman to review after his comments from the previous patch set and -1. Otherwise I think this is nearly ready to go. The turbo-hipster failures on the change appear to be infra issues in t-h rather than problems with the code. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Unexpected error in OpenStack Nova
Hi Hossein, openstack-dev is a development mailing list, focused around the future of OpenStack and the development thereof. I would recommend that you address your question (with appropriate debug log output) to the openstack-operators mailing list. Best regards, Jesse On 3 September 2014 21:46, Hossein Zabolzadeh zabolza...@gmail.com wrote: Any Idea? On Wed, Sep 3, 2014 at 6:41 PM, Hossein Zabolzadeh zabolza...@gmail.com wrote: Hi, After successful installation of both keystone and nova, I tried to execute the 'nova list' command by the folllowing env variables(My Deployment Model is single machine deployment): export OS_USERNAME=admin export OS_PASSWORD=... export OS_TENANT_NAME=service export OS_AUTH_URL=http://10.0.0.1:5000 But the following unknown error was occurred: ERROR: attribute 'message' of 'exceptions.BaseException' objects (HTTP 300) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
Greetings, Last Tuesday the TC held the first graduation review for Zaqar. During the meeting some concerns arose. I've listed those concerns below with some comments hoping that it will help starting a discussion before the next meeting. In addition, I've added some comments about the project stability at the bottom and an etherpad link pointing to a list of use cases for Zaqar. # Concerns - Concern on operational burden of requiring NoSQL deploy expertise to the mix of openstack operational skills For those of you not familiar with Zaqar, it currently supports 2 nosql drivers - MongoDB and Redis - and those are the only 2 drivers it supports for now. This will require operators willing to use Zaqar to maintain a new (?) NoSQL technology in their system. Before expressing our thoughts on this matter, let me say that: 1. By removing the SQLAlchemy driver, we basically removed the chance for operators to use an already deployed OpenStack-technology 2. Zaqar won't be backed by any AMQP based messaging technology for now. Here's[0] a summary of the research the team (mostly done by Victoria) did during Juno 3. We (OpenStack) used to require Redis for the zmq matchmaker 4. We (OpenStack) also use memcached for caching and as the oslo caching lib becomes available - or a wrapper on top of dogpile.cache - Redis may be used in place of memcached in more and more deployments. 5. Ceilometer's recommended storage driver is still MongoDB, although Ceilometer has now support for sqlalchemy. (Please correct me if I'm wrong). That being said, it's obvious we already, to some extent, promote some NoSQL technologies. However, for the sake of the discussion, lets assume we don't. I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't keep avoiding these technologies. NoSQL technologies have been around for years and we should be prepared - including OpenStack operators - to support these technologies. Not every tool is good for all tasks - one of the reasons we removed the sqlalchemy driver in the first place - therefore it's impossible to keep an homogeneous environment for all services. With this, I'm not suggesting to ignore the risks and the extra burden this adds but, instead of attempting to avoid it completely by not evolving the stack of services we provide, we should probably work on defining a reasonable subset of NoSQL services we are OK with supporting. This will help making the burden smaller and it'll give operators the option to choose. [0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/ - Concern on should we really reinvent a queue system rather than piggyback on one As mentioned in the meeting on Tuesday, Zaqar is not reinventing message brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack flavor on top. [0] Some things that differentiate Zaqar from SQS is it's capability for supporting different protocols without sacrificing multi-tenantcy and other intrinsic features it provides. Some protocols you may consider for Zaqar are: STOMP, MQTT. As far as the backend goes, Zaqar is not re-inventing it either. It sits on top of existing storage technologies that have proven to be fast and reliable for this task. The choice of using NoSQL technologies has a lot to do with this particular thing and the fact that Zaqar needs a storage capable of scaling, replicating and good support for failover. [0] https://wiki.openstack.org/wiki/Zaqar/Frequently_asked_questions#Is_Zaqar_a_provisioning_service_or_a_data_API.3F - concern on dataplane vs. controlplane, should we add more dataplane things in the integrated release I'm really not sure I understand the arguments against dataplane services. What concerns do people have about these services? As far as I can tell, we already have several services - some in the lower layers - that provide a data plane API. For example: * keystone (service catalogs and tokens) * glance (image management) * swift (object storage) * ceilometer (metrics) * heat (provisioning) * barbican (key management) Are the concerns specific to Zaqar's dataplane API? - concern on API v2 being already planned At the meeting, we discussed a bit about Zaqar's API and more importantly how stable it is. During that discussion I mentioned an hypothetical v2 of the API. I'd like to clarify that a v2 is not being planned for Kilo, what we would like to do is to gather feedback from the community and services consuming Zaqar about the existing API and use that feedback to design a new version of the API if necessary. All this has yet to be discussed but most importantly, we would first like to get more feedback from the community. We have already gotten some feedback, but it has been fairly limited because most people are waiting for us to graduate before kicking the tires. We do have some endpoints that will go away in the API v2 - getting messages by id,
Re: [openstack-dev] Unexpected error in OpenStack Nova
Hi Jesse, Thanks for your help. I'll continue my discussion under the other related mailing list. On Thu, Sep 4, 2014 at 11:23 AM, Jesse Pretorius jesse.pretor...@gmail.com wrote: Hi Hossein, openstack-dev is a development mailing list, focused around the future of OpenStack and the development thereof. I would recommend that you address your question (with appropriate debug log output) to the openstack-operators mailing list. Best regards, Jesse On 3 September 2014 21:46, Hossein Zabolzadeh zabolza...@gmail.com wrote: Any Idea? On Wed, Sep 3, 2014 at 6:41 PM, Hossein Zabolzadeh zabolza...@gmail.com wrote: Hi, After successful installation of both keystone and nova, I tried to execute the 'nova list' command by the folllowing env variables(My Deployment Model is single machine deployment): export OS_USERNAME=admin export OS_PASSWORD=... export OS_TENANT_NAME=service export OS_AUTH_URL=http://10.0.0.1:5000 But the following unknown error was occurred: ERROR: attribute 'message' of 'exceptions.BaseException' objects (HTTP 300) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [feature freeze exception] Move to oslo.db
Hey, Yep, I became aware of these this afternoon. The negative votes are due to a bad nodepool image. I've rebuilt them and am working on clearing the backlog. Sorry for the issues. Cheers, Josh Rackspace Australia On 9/4/14 4:30 PM, Michael Still wrote: I'm good with this one too, so that makes three if Joe is ok with this. @Josh -- can you please take a look at the TH failures? Thanks, Michael On Wed, Sep 3, 2014 at 8:10 PM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 9/3/2014 5:08 PM, Andrey Kurilin wrote: Hi All! I'd like to ask for a feature freeze exception for porting nova to use oslo.db. This change not only removes 3k LOC, but fixes 4 bugs(see commit message for more details) and provides relevant, stable common db code. Main maintainers of oslo.db(Roman Podoliaka and Victor Sergeyev) are OK with this. Joe Gordon and Matt Riedemann are already signing up, so we need one more vote from Core developer. By the way a lot of core projects are using already oslo.db for a while: keystone, cinder, glance, ceilometer, ironic, heat, neutron and sahara. So migration to oslo.db won’t produce any unexpected issues. Patch is here: https://review.openstack.org/#/c/101901/ -- Best regards, Andrey Kurilin. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Just re-iterating my agreement to sponsor this. I'm waiting for the latest patch set to pass Jenkins and for Roman to review after his comments from the previous patch set and -1. Otherwise I think this is nearly ready to go. The turbo-hipster failures on the change appear to be infra issues in t-h rather than problems with the code. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] [infra] Alpha wheels for Python 3.x
On Wed, Sep 3, 2014 at 7:24 PM, Doug Hellmann d...@doughellmann.com wrote: On Sep 3, 2014, at 5:27 AM, Yuriy Taraday yorik@gmail.com wrote: On Tue, Sep 2, 2014 at 11:17 PM, Clark Boylan cboy...@sapwetik.org wrote: It has been pointed out to me that one case where it won't be so easy is oslo.messaging and its use of eventlet under python2. Messaging will almost certainly need python 2 and python 3 wheels to be separate. I think we should continue to use universal wheels where possible and only build python2 and python3 wheels in the special cases where necessary. We can make eventlet an optional dependency of oslo.messaging (through setuptools' extras). In fact I don't quite understand the need for eventlet as direct dependency there since we can just write code that uses threading library and it'll get monkeypatched if consumer app wants to use eventlet. There is code in the messaging library that makes calls directly into eventlet now, IIRC. It sounds like that could be changed, but that’s something to consider for a future version. Yes, I hope to see unified threading/eventlet executor there (futures-based, I guess) some day. The last time I looked at setuptools extras they were a documented but unimplemented specification. Has that changed? According to docs [1] it works in pip (and has been working in setuptools for ages), and according to bug [2], it has been working for couple years. [1] http://pip.readthedocs.org/en/latest/reference/pip_install.html#examples (#6) [2] https://github.com/pypa/pip/issues/7 -- Kind regards, Yuriy. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] [infra] Alpha wheels for Python 3.x
On Wed, Sep 3, 2014 at 8:21 PM, Doug Hellmann d...@doughellmann.com wrote: On Sep 3, 2014, at 11:57 AM, Clark Boylan cboy...@sapwetik.org wrote: On Wed, Sep 3, 2014, at 08:22 AM, Doug Hellmann wrote: On Sep 2, 2014, at 3:17 PM, Clark Boylan cboy...@sapwetik.org wrote: The setup.cfg classifiers should be able to do that for us, though PBR may need updating? We will also need to learn to upload potentially 1 How do you see that working? We want all of the Oslo libraries to, eventually, support both python 2 and 3. How would we use the classifiers to tell when to build a universal wheel and when to build separate wheels? The classifiers provide info on the versions of python we support. By default we can build python2 wheel if only 2 is supported, build python3 wheel if only 3 is supported, build a universal wheel if both are supported. Then we can add a setup.cfg flag to override the universal wheel default to build both a python2 and python3 wheel instead. Dstufft and mordred should probably comment on this idea before we implement anything. OK. I’m not aware of any python-3-only projects, and the flag to override the universal wheel is the piece I was missing. I think there’s already a setuptools flag related to whether or not we should build universal wheels, isn’t there? I think we should rely on wheel.universal flag from setup.cfg if it's there. If it's set, we should always build universal wheels. If it's not set, we should look in specifiers and build wheels for Python versions that are mentioned there. -- Kind regards, Yuriy. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] [infra] Alpha wheels for Python 3.x
On Thu, Sep 4, 2014 at 4:47 AM, Jeremy Stanley fu...@yuggoth.org wrote: On 2014-09-03 13:27:55 +0400 (+0400), Yuriy Taraday wrote: [...] May be we should drop 3.3 already? It's in progress. Search review.openstack.org for open changes in all projects with the topic py34. Shortly I'll also have some infra config changes up to switch python33 jobs out for python34, ready to drop once the j-3 milestone has been tagged and is finally behind us. Great! Looking forward to purging python 3.3 from my system. -- Kind regards, Yuriy. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 09/04/2014 01:37 AM, Robert Collins wrote: On 4 September 2014 00:13, Eoghan Glynn egl...@redhat.com wrote: On 09/02/2014 11:33 PM, Robert Collins wrote: The implementation in ceilometer is very different to the Ironic one - are you saying the test you linked fails with Ironic, or that it fails with the ceilometer code today? Disclaimer: in Ironic terms, node = conductor, key = host The test I linked fails with Ironic hash ring code (specifically the part that tests consistency). With 1000 keys being mapped to 10 nodes, when you add a node: - current ceilometer code remaps around 7% of the keys ( 1/#nodes) - Ironic code remaps 90% of the keys So just to underscore what Nejc is saying here ... The key point is the proportion of such baremetal-nodes that would end up being re-assigned when a new conductor is fired up. That was 100% clear, but thanks for making sure. The question was getting a proper understanding of why it was happening in Ironic. The ceilometer hashring implementation is good, but it uses the same terms very differently (e.g. replicas for partitions) - I'm adapting the key fix back into Ironic - I'd like to see us converge on a single implementation, and making sure the Ironic one is suitable for ceilometer seems applicable here (since ceilometer seems to need less from the API), I used the terms that are used in the original caching use-case, as described in [1] and are used in the pypi lib as well[2]. With the correct approach, there aren't actually any partitions, 'replicas' actually denotes the number of times you hash a node onto the ring. As for nodeskeys, what's your suggestion? I've opened a bug[3], so you can add a Closes-Bug to your patch. [1] http://www.martinbroadhurst.com/Consistent-Hash-Ring.html [2] https://pypi.python.org/pypi/hash_ring [3] https://bugs.launchpad.net/ironic/+bug/1365334 If reassigning was cheap Ironic wouldn't have bothered having a hash ring :) -Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] can not show the soft_deleted instance
hi all : I found nova can not list the soft_deleted instances(v2 an v3 no show), but seen from novaclient help [tagett@stack-01 devstack]$ nova help restore usage: nova restore server Restore a soft-deleted server. how can I restore a soft-deleted server ? we can not list the soft deleted instances. [tagett@stack-01 devstack]$ nova list --all-tenants --status SOFT_DELETED ++--+++-+--+ | ID | Name | Status | Task State | Power State | Networks | ++--+++-+--+ ++--+++-+--+ but from the database we can see this status. MariaDB [nova] select hostname ,vm_state from instances; | vm1 | soft-delete | | vm | soft-delete | I check v3 code, it don't support restore even, how will this go ? I find some discussion https://review.openstack.org/#/c/35061/2 any comments on that ? this is designed ? we abandon the restore operation ? beside I have a WIP patch to have soft-deleted instance listed in v3 api. https://review.openstack.org/#/c/118641/ -- Thanks, Eli (Li Yong) Qiao ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][api] can not show the soft_deleted instance with nova list
hi all : I found nova can not list the soft_deleted instances(v2 an v3 no show), but seen from novaclient help /[tagett@stack-01 devstack]$ nova help restore // //usage: nova restore server// //Restore a soft-deleted server./ how can I restore a soft-deleted server ? we can not list the soft deleted instances. /[tagett@stack-01 devstack]$ nova list --all-tenants --status SOFT_DELETED// //++--+++-+--+// //| ID | Name | Status | Task State | Power State | Networks |// //++--+++-+--+// //++--+++-+--+// / but from the database we can see this status. /MariaDB [nova] select hostname ,vm_state from instances;// //| vm1| soft-delete |// //| vm | soft-delete |/ I check v3 code, it don't support restore even, how will this go ? I find some discussion https://review.openstack.org/#/c/35061/2 any comments on that ? this is designed ? we abandon the restore operation ? beside I have a WIP patch to have soft-deleted instance listed in v3 api. https://review.openstack.org/#/c/118641/ -- Thanks, Eli (Li Yong) Qiao ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] [feature freeze exception] FFE for instance tags API extension
Hello. I'd like to ask for a feature freeze exception for the instance tags API extension: https://review.openstack.org/#/c/97168/ https://review.openstack.org/#/c/103553/ https://review.openstack.org/#/c/107712/ approved spec https://review.openstack.org/#/c/91444/ blueprint was approved, but its status was changed to Pending Approval because of FF. https://blueprints.launchpad.net/nova/+spec/tag-instances The first of these patches has got a +2 from Jay Pipes. The second was already approved by Jay Pipes and Matt Dietz. This set of patches was pretty close to merge, but FF came. In most popular REST API interfaces, objects in the domain model can be tagged with zero or more simple strings. This feature will allow normal users to add, remove and list tags for an instance and filter instances by tags. Also these changes will allow to use tags to tag other nova objects in future because created Tag object is universal and any nova object with id could be tagged by it. Please consider this feature to be a part of Juno-3 release. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [OpenStack-dev][neutron] Some questions and opinions about the network model port.
hello,guys port is a very important model concept for network program. I have some doubt about that concept now. Maybe it is still necessary to discuss the definition details of port and give some suggestions. For instance, when we connect a network to a router, neutron will create a interface for router, actually it is implemented as a port. From the topology perspective, use link concept is more close to real description. There is a question here: Does the network need a port to deploy control policy?I think so. We can treat port as a property of a kind of network node. Without the node, port willlose its significance.So, shell we emphase the port as a property in the network model? Looking forward to everyone's opinions. Thanks Shixing Liu ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
Excerpts from Flavio Percoco's message of 2014-09-04 00:08:47 -0700: Greetings, Last Tuesday the TC held the first graduation review for Zaqar. During the meeting some concerns arose. I've listed those concerns below with some comments hoping that it will help starting a discussion before the next meeting. In addition, I've added some comments about the project stability at the bottom and an etherpad link pointing to a list of use cases for Zaqar. Hi Flavio. This was an interesting read. As somebody whose attention has recently been drawn to Zaqar, I am quite interested in seeing it graduate. # Concerns - Concern on operational burden of requiring NoSQL deploy expertise to the mix of openstack operational skills For those of you not familiar with Zaqar, it currently supports 2 nosql drivers - MongoDB and Redis - and those are the only 2 drivers it supports for now. This will require operators willing to use Zaqar to maintain a new (?) NoSQL technology in their system. Before expressing our thoughts on this matter, let me say that: 1. By removing the SQLAlchemy driver, we basically removed the chance for operators to use an already deployed OpenStack-technology 2. Zaqar won't be backed by any AMQP based messaging technology for now. Here's[0] a summary of the research the team (mostly done by Victoria) did during Juno 3. We (OpenStack) used to require Redis for the zmq matchmaker 4. We (OpenStack) also use memcached for caching and as the oslo caching lib becomes available - or a wrapper on top of dogpile.cache - Redis may be used in place of memcached in more and more deployments. 5. Ceilometer's recommended storage driver is still MongoDB, although Ceilometer has now support for sqlalchemy. (Please correct me if I'm wrong). That being said, it's obvious we already, to some extent, promote some NoSQL technologies. However, for the sake of the discussion, lets assume we don't. I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't keep avoiding these technologies. NoSQL technologies have been around for years and we should be prepared - including OpenStack operators - to support these technologies. Not every tool is good for all tasks - one of the reasons we removed the sqlalchemy driver in the first place - therefore it's impossible to keep an homogeneous environment for all services. I whole heartedly agree that non traditional storage technologies that are becoming mainstream are good candidates for use cases where SQL based storage gets in the way. I wish there wasn't so much FUD (warranted or not) about MongoDB, but that is the reality we live in. With this, I'm not suggesting to ignore the risks and the extra burden this adds but, instead of attempting to avoid it completely by not evolving the stack of services we provide, we should probably work on defining a reasonable subset of NoSQL services we are OK with supporting. This will help making the burden smaller and it'll give operators the option to choose. [0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/ - Concern on should we really reinvent a queue system rather than piggyback on one As mentioned in the meeting on Tuesday, Zaqar is not reinventing message brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack flavor on top. [0] I think Zaqar is more like SMTP and IMAP than AMQP. You're not really trying to connect two processes in real time. You're trying to do fully asynchronous messaging with fully randomized access to any message. Perhaps somebody should explore whether the approaches taken by large scale IMAP providers could be applied to Zaqar. Anyway, I can't imagine writing a system to intentionally use the semantics of IMAP and SMTP. I'd be very interested in seeing actual use cases for it, apologies if those have been posted before. Some things that differentiate Zaqar from SQS is it's capability for supporting different protocols without sacrificing multi-tenantcy and other intrinsic features it provides. Some protocols you may consider for Zaqar are: STOMP, MQTT. As far as the backend goes, Zaqar is not re-inventing it either. It sits on top of existing storage technologies that have proven to be fast and reliable for this task. The choice of using NoSQL technologies has a lot to do with this particular thing and the fact that Zaqar needs a storage capable of scaling, replicating and good support for failover. What's odd to me is that other systems like Cassandra and Riak are not being discussed. There are well documented large scale message storage systems on both, and neither is encumbered by the same licensing FUD as MongoDB. Anyway, again if we look at this as a place to storage and retrieve messages, and not as a queue, then talking about databases, instead of message brokers, makes a lot more sense. - concern on the maturity of the NoQSL not AGPL backend (Redis)
Re: [openstack-dev] [neutron] Status of Neutron at Juno-3
I didn't know that we could ask for FFE, so I'd like to ask (if yet in time) for: https://blueprints.launchpad.net/neutron/+spec/agent-child-processes-status https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:bp/agent-child-processes-status,n,z To get the ProcessMonitor implemented in the l3_agent and dhcp_agent at least. I believe the work is ready (I need to check the radvd respawn in the l3 agent). The ProcessMonitor class is already merged. Best regards, Miguel Ángel. - Original Message - On Wed, Sep 3, 2014 at 10:19 AM, Mark McClain m...@mcclain.xyz wrote: On Sep 3, 2014, at 11:04 AM, Brian Haley brian.ha...@hp.com wrote: On 09/03/2014 08:17 AM, Kyle Mestery wrote: Given how deep the merge queue is (146 currently), we've effectively reached feature freeze in Neutron now (likely other projects as well). So this morning I'm going to go through and remove BPs from Juno which did not make the merge window. I'll also be putting temporary -2s in the patches to ensure they don't slip in as well. I'm looking at FFEs for the high priority items which are close but didn't quite make it: https://blueprints.launchpad.net/neutron/+spec/l3-high-availability https://blueprints.launchpad.net/neutron/+spec/add-ipset-to-security https://blueprints.launchpad.net/neutron/+spec/security-group-rules-for-devices-rpc-call-refactor I guess I'll be the first to ask for an exception for a Medium since the code was originally completed in Icehouse: https://blueprints.launchpad.net/neutron/+spec/l3-metering-mgnt-ext The neutronclient-side code was committed in January, and the neutron side, https://review.openstack.org/#/c/70090 has had mostly positive reviews since then. I've really just spent the last week re-basing it as things moved along. +1 for FFE. I think this is good community that fell through the cracks. I agree, and I've marked it as RC1 now. I'll sort through these with ttx on Friday and get more clarity on it's official status. Thanks, Kyle mark ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
On 2 September 2014 19:16, Michael Still mi...@stillhq.com wrote: We're soon to hit feature freeze, as discussed in Thierry's recent email. I'd like to outline the process for requesting a freeze exception: * your code must already be up for review * your blueprint must have an approved spec * you need three (3) sponsoring cores for an exception to be granted * exceptions must be granted before midnight, Friday this week (September 5) UTC * the exception is valid until midnight Friday next week (September 12) UTC when all exceptions expire Sorry to top post on this, need to clarify this point: your blueprint must have an approved spec I have unapproved the *blueprints* as part of removing things from juno-3. The reason for this is because drivers control the approved status, but not control the milestone. I have added a dated note at the base of each whiteboard, explaining what was happening to the blueprint. Yes, that all kinda sucks, but its what we have right now. This is independent of the spec having been approved, and merged into the specs repo. So we can tell if it got approved into juno still, from looking at the specs for juno here: http://specs.openstack.org/openstack/nova-specs/ Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
On 2 September 2014 21:36, Dan Genin daniel.ge...@jhuapl.edu wrote: Just out of curiosity, what is the rational behind upping the number of core sponsors for feature freeze exception to 3 if only two +2 are required to merge? In Icehouse, IIRC, two core sponsors was deemed sufficient. We tried having 2 cores in the past, and stuff still didn't get reviewed. Usually, as on of the sponsors had other things crop up that took priority, or just didn't get chance to review it. The idea of 3 being that we can loose one person, and still have enough people to merge the code. If that doesn't work out, then we will try something different next time. It was discussed in the nova-meeting around spec freeze time, and at the mid-cycle a tiny bit, unless I totally miss-remember that. Why do this at all? Well we want cores to focus on bug reviews, post FF. So they need to find extra time to review any of the exceptions, hence the opt in process. Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
-Original Message- From: Nikola Đipanov [mailto:ndipa...@redhat.com] Sent: 03 September 2014 10:50 To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno snip I will follow up with a more detailed email about what I believe we are missing, once the FF settles and I have applied some soothing creme to my burnout wounds, but currently my sentiment is: Contributing features to Nova nowadays SUCKS!!1 (even as a core reviewer) We _have_ to change that! N. While agreeing with your overall sentiment, what worries me a tad is implied perception that contributing as a core should somehow be easier that as a mortal.While I might expect cores to produce better initial code, I though the process and standards were intended to be a level playing field. Has anyone looked at the review bandwidth issue from the perspective of whether there has been a change in the amount of time cores now spend contributing vs reviewing ? Maybe there's an opportunity to get cores to mentor non-cores to do the code production, freeing up review cycles ? Phil ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
Hey Clint, Thanks for reading, some comments in-line: On 09/04/2014 10:30 AM, Clint Byrum wrote: Excerpts from Flavio Percoco's message of 2014-09-04 00:08:47 -0700: [snip] - Concern on should we really reinvent a queue system rather than piggyback on one As mentioned in the meeting on Tuesday, Zaqar is not reinventing message brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack flavor on top. [0] I think Zaqar is more like SMTP and IMAP than AMQP. You're not really trying to connect two processes in real time. You're trying to do fully asynchronous messaging with fully randomized access to any message. Perhaps somebody should explore whether the approaches taken by large scale IMAP providers could be applied to Zaqar. Anyway, I can't imagine writing a system to intentionally use the semantics of IMAP and SMTP. I'd be very interested in seeing actual use cases for it, apologies if those have been posted before. Some things that differentiate Zaqar from SQS is it's capability for supporting different protocols without sacrificing multi-tenantcy and other intrinsic features it provides. Some protocols you may consider for Zaqar are: STOMP, MQTT. As far as the backend goes, Zaqar is not re-inventing it either. It sits on top of existing storage technologies that have proven to be fast and reliable for this task. The choice of using NoSQL technologies has a lot to do with this particular thing and the fact that Zaqar needs a storage capable of scaling, replicating and good support for failover. What's odd to me is that other systems like Cassandra and Riak are not being discussed. There are well documented large scale message storage systems on both, and neither is encumbered by the same licensing FUD as MongoDB. FWIW, they both have been discussed. As far as Cassandra goes, we raised the red flag after reading reading this post[0]. The post itself may be obsolete already but I don't think I have enough knowledge about Cassandra to actually figure this out. Some folks have come to us asking for a Cassandra driver and they were interested in contributing/working on one. I really hope that will happen someday, although it'll certainly happen as an external driver. Riak, on the other hand, was certainly a good candidate. What made us go with MongoDB and Redis is they're both good for the job, they are both likely already deployed in OpenStack clouds and we have enough knowledge to provide support and maintenance for both drivers. As a curious note, ElasticSearch and Swift have also been brought up several times as valid stores for Zaqar. I haven't thought about this throughly but I think they'd do a good job. [0] http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets Anyway, again if we look at this as a place to storage and retrieve messages, and not as a queue, then talking about databases, instead of message brokers, makes a lot more sense. - concern on the maturity of the NoQSL not AGPL backend (Redis) Redis backend just landed and I've been working on a gate job for it today. Although it hasn't been tested in production, if Zaqar graduates, it still has a full development cycle to be tested and improved before the first integrated release happens. I'd be quite interested to see how it is expected to scale. From my very quick reading of the driver, it only supports a single redis server. No consistent hash ring or anything like that. Indeed, support for redis-cluster is in our roadmap[0]. As of now, it can be scaled by using Zaqar pools. You can create several pools of redis nodes, that you can balance between queues. The next series of benchmarks will be done on this new Redis driver. I hope those will be ready soon. [0] https://blueprints.launchpad.net/zaqar/+spec/redis-pool # Use Cases In addition to the aforementioned concerns and comments, I also would like to share an etherpad that contains some use cases that other integrated projects have for Zaqar[0]. The list is not exhaustive and it'll contain more information before the next meeting. [0] https://etherpad.openstack.org/p/zaqar-integrated-projects-use-cases Just taking a look, there are two basic applications needed: 1) An inbox. Horizon wants to know when snapshots are done. Heat wants to know what happened during a stack action. Etc. 2) A user-focused message queue. Heat wants to push data to agents. Swift wants to synchronize processes when things happen. To me, #1 is Zaqar as it is today. #2 is the one that I worry may not be served best by bending #1 onto it. Push semantics are being developed. We've had enough discussions that have helped preparing the ground for it. However, I believe both use cases could be covered by Zaqar as-is. Could you elaborate a bit more on #2? Especially on why you think Zaqar as is can't serve this specific case? Also, feel free to add use-cases to that etherpad if
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
One final note: the specs referenced above didn't get approved until Spec Freeze, which seemed to leave me with less time to implement things. In fact, it seemed that a lot of specs didn't get approved until spec freeze. Perhaps if we had more staggered approval of specs, we'd have more staggered submission of patches, and thus less of a sudden influx of patches in the couple weeks before feature proposal freeze. Yeah I think the specs were getting approved too late into the cycle, I was actually surprised at how far out the schedules were going in allowing things in and then allowing exceptions after that. Hopefully the ideas around priorities/slots/runways will help stagger some of this also. I think there is a problem with the pattern that seemed to emerge in June where the J.1 period was taken up with spec review (a lot of good reviews happened early in that period, but the approvals kind of came in a lump at the end) meaning that the implementation work itself only seemed to really kick in during J.2 - and not surprisingly given the complexity of some of the changes ran late into J.3. We also has previously noted didn’t do any prioritization between those specs that were approved - so it was always going to be a race to who managed to get code up for review first. It kind of feels to me as if the ideal model would be if we were doing spec review for K now (i.e during the FF / stabilization period) so that we hit Paris with a lot of the input already registered and a clear idea of the range of things folks want to do.We shouldn't really have to ask for session suggestions for the summit - they should be something that can be extracted from the proposed specs (maybe we do voting across the specs or something like that).In that way the summit would be able to confirm the list of specs for K and the priority order. With the current state of the review queue maybe we can’t quite hit this pattern for K, but would be worth aspiring to for I ? Phil ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
Sorry for another top post, but I like how Nikola has pulled this problem apart, and wanted to respond directly to his response. On 3 September 2014 10:50, Nikola Đipanov ndipa...@redhat.com wrote: The reason many features including my own may not make the FF is not because there was not enough buy in from the core team (let's be completely honest - I have 3+ other core members working for the same company that are by nature of things easier to convince), but because of any of the following: * Crippling technical debt in some of the key parts of the code +1 We have problems that need solving. One of the ideas behind the slots proposal is to encourage work on the urgent technical debt, before related features are even approved. * that we have not been acknowledging as such for a long time -1 We keep saying thats cool, but we have to fix/finish XXX first. But... we have been very bad at: * remembering that, and recording that * actually fixing those problems * which leads to proposed code being arbitrarily delayed once it makes the glaring flaws in the underlying infra apparent Sometimes we only spot this stuff in code reviews, where you throw up reading all the code around the change, and see all the extra complexity being added to a fragile bit of the code, and well, then you really don't want to be the person who clicks approve on that. We need to track this stuff better. Every time it happens, we should try make a not to go back there and do more tidy ups. * and that specs process has been completely and utterly useless in helping uncover (not that process itself is useless, it is very useful for other things) Yeah, it hasn't helped for this. I don't think we should do this, but I keep thinking about making specs two step: * write generally direction doc * go write the code, maybe upload as WIP * write the documentation part of the spec * get docs merged before any code I am almost positive we can turn this rather dire situation around easily in a matter of months, but we need to start doing it! It will not happen through pinning arbitrary numbers to arbitrary processes. +1 This is ongoing, but there are some major things, I feel we should stop and fix in kilo. ...and that will make getting features in much worse for a little while, but it will be much better on the other side. I will follow up with a more detailed email about what I believe we are missing, once the FF settles and I have applied some soothing creme to my burnout wounds Awesome, please catch up with jogo who was also trying to build this list. I would love to continue to contribute to that too. Might be working moving into here: https://etherpad.openstack.org/p/kilo-nova-summit-topics The idea was/is to use that list to decide what fills up the majority of code slots in Juno. but currently my sentiment is: Contributing features to Nova nowadays SUCKS!!1 (even as a core reviewer) We _have_ to change that! Agreed. In addition, our bug list would suggest our users are seeing the impact of this technical debt. My personal feeling is we also need to tidy up our testing debt too: * document major bits that are NOT tested, so users are clear * document what combinations and features we actually see tested up stream * support different levels of testing: on-demand+daily vs every commit * making it easier to interested parties to own and maintain some testing * plan for removing the untested code paths in L * allow for untested code to enter the tree, as experimental, with the expectation it gets removed in the following release if not tested, and architected so that is possible (note this means supporting experimental APIs that can be ripped out at a later date.) We have started doing some of the above work. But I think we need to hold ALL code to the same standard. It seems it will take time to agree on that standard, but the above is an attempt to compromise between speed of innovation and stability. Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][IPv6] Neighbor Discovery for HA
Carl, Thanks a lot for your reply! If I understand correctly, in VRRP case, keepalived will be responsible for sending out GARPs? By checking the code you provided, I can see all the _send_gratuitous_arp_packet call are wrapped by if not is_ha condition. Xu Han On 09/04/2014 06:06 AM, Carl Baldwin wrote: It should be noted that send_arp_for_ha is a configuration option that preceded the more recent in-progress work to add VRRP controlled HA to Neutron's router. The option was added, I believe, to cause the router to send (default) 3 GARPs to the external gateway if the router was removed from one network node and added to another by some external script or manual intervention. It did not send anything on the internal network ports. VRRP is a different story and the code in review [1] sends GARPs on internal and external ports. Hope this helps avoid confusion in this discussion. Carl [1] https://review.openstack.org/#/c/70700/37/neutron/agent/l3_ha_agent.py On Mon, Sep 1, 2014 at 8:52 PM, Xu Han Peng pengxu...@gmail.com wrote: Anthony, Thanks for your reply. If HA method like VRRP are used for IPv6 router, according to the VRRP RFC with IPv6 included, the servers should be auto-configured with the active router's LLA as the default route before the failover happens and still remain that route after the failover. In other word, there should be no need to use two LLAs for default route of a subnet unless load balance is required. When the backup router become the master router, the backup router should be responsible for sending out an unsolicited ND neighbor advertisement with the associated LLA (the previous master's LLA) immediately to update the bridge learning state and sending out router advertisement with the same options with the previous master to maintain the route and bridge learning. This is shown in http://tools.ietf.org/html/rfc5798#section-4.1 and the actions backup router should take after failover is documented here: http://tools.ietf.org/html/rfc5798#section-6.4.2. The need for immediate messaging sending and periodic message sending is documented here: http://tools.ietf.org/html/rfc5798#section-2.4 Since the keepalived manager support for L3 HA is merged: https://review.openstack.org/#/c/68142/43. And keepalived release 1.2.0 supports VRRP IPv6 features ( http://www.keepalived.org/changelog.html, see Release 1.2.0 | VRRP IPv6 Release). I think we can check if keepalived can satisfy our requirement here and if that will cause any conflicts with RADVD. Thoughts? Xu Han On 08/28/2014 10:11 PM, Veiga, Anthony wrote: Anthony and Robert, Thanks for your reply. I don't know if the arping is there for NAT, but I am pretty sure it's for HA setup to broadcast the router's own change since the arping is controlled by send_arp_for_ha config. By checking the man page of arping, you can find the arping -A we use in code is sending out ARP REPLY instead of ARP REQUEST. This is like saying I am here instead of where are you. I didn't realized this either until Brain pointed this out at my code review below. That’s what I was trying to say earlier. Sending out the RA is the same effect. RA says “I’m here, oh and I’m also a router” and should supersede the need for an unsolicited NA. The only thing to consider here is that RAs are from LLAs. If you’re doing IPv6 HA, you’ll need to have two gateway IPs for the RA of the standby to work. So far as I know, I think there’s still a bug out on this since you can only have one gateway per subnet. http://linux.die.net/man/8/arping https://review.openstack.org/#/c/114437/2/neutron/agent/l3_agent.py Thoughts? Xu Han On 08/27/2014 10:01 PM, Veiga, Anthony wrote: Hi Xuhan, What I saw is that GARP is sent to the gateway port and also to the router ports, from a neutron router. I’m not sure why it’s sent to the router ports (internal network). My understanding for arping to the gateway port is that it is needed for proper NAT operation. Since we are not planning to support ipv6 NAT, so this is not required/needed for ipv6 any more? I agree that this is no longer necessary. There is an abandoned patch that disabled the arping for ipv6 gateway port: https://review.openstack.org/#/c/77471/3/neutron/agent/l3_agent.py thanks, Robert On 8/27/14, 1:03 AM, Xuhan Peng pengxu...@gmail.com wrote: As a follow-up action of yesterday's IPv6 sub-team meeting, I would like to start a discussion about how to support l3 agent HA when IP version is IPv6. This problem is triggered by bug [1] where sending gratuitous arp packet for HA doesn't work for IPv6 subnet gateways. This is because neighbor discovery instead of ARP should be used for IPv6. My thought to solve this problem turns into how to send out neighbor advertisement for IPv6 routers just like sending ARP reply for IPv4 routers after reading the comments on code review [2]. I searched for utilities which can do this and only find a utility called ndsend [3] as part of
[openstack-dev] [Glance][FFE] glance_store switch-over and random access to image data
Greetings, I'd like to request a FFE for 2 features I've been working on during Juno which, unfortunately, haven been delayed for different reasons during this time. The first feature is the switch-over to glance_store. Glance store, for those not familiar with it, is a library containing the code that used to live under `glance/store`. During Icehouse, this idea came up and I started working on it right away. By the time of Icehouse was released, the library was not mature enough and we (glance-core) were a bit concerned about the risks of rushing this work. At the Juno summit, I led a session on this library were we discussed the current status and agreed on a path forward for this library and the other feature below. The library contains glance's old store code with really few changes to the API in order to make it decent enough for a library. As you can see in the review[0], which has been around since June 17th, the amount of code changed is small. Once the rename of the library[1] happened, the gate tests started passing. This is to say that the risks related to the library itself are low. One bit that worries me is the alignment between the current glance_store library and the stores that still exist in glance. I believe some patches have landed that we need to port to this new library. However, I'm less worried about that because we can still backport them and release a new version of the library and it'll still be consumed by Glance - sorry if it seems I'm oversimplifying the issue. The second feature that I'd like to get a FFE exception for is the random access to image data. This feature was approved and agreed on for Juno. Instead of implementing it in Glance directly, I decided - genuinely - to implement it on top of glance_store to avoid the re-implementation once the library was done. Unfortunately, due to the delays the library had, this patch got stuck in the review queue. The feature has been in the review[2] queue since Jun 25. The feature was implemented on top of the API v2 and users have to opt-in to access random parts of the image data. This feature has to be backed by glance_store and it depends on the store support for random access. I believe the risk related to this feature is low. Both blueprints[3][4] were discussed and agreed on, although the later doesn't reflect so. Cheers, Flavio [0] https://review.openstack.org/#/c/100636/ [1] http://lists.openstack.org/pipermail/openstack-dev/2014-August/044203.html [2] https://review.openstack.org/#/c/103068/ [3] https://blueprints.launchpad.net/glance/+spec/create-store-package [4] https://blueprints.launchpad.net/glance/+spec/restartable-image-download -- @flaper87 Flavio Percoco ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [rally][iperf] Benchmarking network performance
Hi Ajay, Thank you for your work on this. Could you please send on review your code? Here is the instruction: https://wiki.openstack.org/wiki/Rally/Develop#How_to_contribute Thanks Best regards, Boris Pavlovic On Wed, Sep 3, 2014 at 10:47 PM, Ajay Kalambur (akalambu) akala...@cisco.com wrote: Hi Looking into the following blueprint which requires that network performance tests be done as part of a scenario I plan to implement this using iperf and basically a scenario which includes a client/server VM pair The client than sends out TCP traffic using iperf to server and the VM throughput is recorded I have a WIP patch attached to this email The patch has a dependency on following 2 review https://review.openstack.org/#/c/103306/ https://review.openstack.org/#/c/96300 https://review.openstack.org/#/c/96300 On top of this it creates a new VM performance scenario and uses floating ips to access the VM and download iperf to the VM and than run throughout tests The code will be made more modular but this patch will give you a good idea of whats in store. We also need to handle the case next where no floating ip is available and we assume direct access. We need to have ssh to install the tool and drive the tests Please look at the attached diff and let me know if overall the flow looks fine If it does I can make the code more modular and proceed. Note this is Work in progress still Ajay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] New meeting rotation starting next week
Kevin Benton wrote: How is the master list compiled into a calendar? Is it possible to use that same system to filter by project? It's manual. I susbscribe to the wikipage and reflect the change in the Google Cal. It's painful and error-prone. If anyone wants to do it, I'm happy to give the keys and delegate the responsibility. But frankly, what we really need is this: http://lists.openstack.org/pipermail/openstack-infra/2013-December/000517.html There was a group of students working on it: http://git.openstack.org/cgit/openstack-infra/gerrit-powered-agenda/ Lance, any news on that? Should we reboot the project? -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 4 September 2014 19:53, Nejc Saje ns...@redhat.com wrote: I used the terms that are used in the original caching use-case, as described in [1] and are used in the pypi lib as well[2]. With the correct approach, there aren't actually any partitions, 'replicas' actually denotes the number of times you hash a node onto the ring. As for nodeskeys, what's your suggestion? So - we should change the Ironic terms then, I suspect (but lets check with Deva who wrote the original code where he got them from). The parameters we need to create a ring are: - how many fallback positions we use for data (currently referred to as replicas) - how many times we hash the servers hosting data into the ring (currently inferred via the hash_partition_exponent / server count) - the servers and then we probe data items as we go. The original paper isn't http://www.martinbroadhurst.com/Consistent-Hash-Ring.html - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.1879 is referenced by it, and that paper doesn't include the term replica count at all. In other systems like cassandra, replicas generally refers to how many servers end up holding a copy of the data: Martin Broadhurts paper uses replica there in quite a different sense - I much prefer the Ironic use, which says how many servers will be operating on the data: its externally relevant. I've no objection talking about keys, but 'node' is an API object in Ironic, so I'd rather we talk about hosts - or make it something clearly not node like 'bucket' (which the 1997 paper talks about in describing consistent hash functions). So proposal: - key - a stringifyable thing to be mapped to buckets - bucket a worker/store that wants keys mapped to it - replicas - number of buckets a single key wants to be mapped to - partitions - number of total divisions of the hash space (power of 2 required) I've opened a bug[3], so you can add a Closes-Bug to your patch. Thanks! -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?
On 14/08/14 00:03, James Polley wrote: In recent history, we've been looking each week at stats from http://russellbryant.net/openstack-stats/tripleo-openreviews.html to get a gauge on how our review pipeline is tracking. The main stats we've been tracking have been the since the last revision without -1 or -2. I've included some history at [1], but the summary is that our 3rd quartile has slipped from 13 days to 16 days over the last 4 weeks or so. Our 1st quartile is fairly steady lately, around 1 day (down from 4 a month ago) and median is unchanged around 7 days. There was lots of discussion in our last meeting about what could be causing this[2]. However, the thing we wanted to bring to the list for the discussion is: Are we tracking the right metric? Should we be looking to something else to tell us how well our pipeline is performing? The meeting logs have quite a few suggestions about ways we could tweak the existing metrics, but if we're measuring the wrong thing that's not going to help. I think that what we are looking for is a metric that lets us know whether the majority of patches are getting feedback quickly. Maybe there's some other metric that would give us a good indication? Bring back auto abandon... Gerrit at one stage not so long ago used to auto abandon patches that had negative feedback and was over a week without activity, I believe this was removed when a gerrit upgrade gave core reviewers the ability to abandon other peoples patches. This was the single best part of the entire process to keep things moving o submitters were forced to keep patches current o reviewers were not looking at stale or already known broken patches o If something wasn't important it got abandoned and was never heard of again, if it was important it would be reopened o patch submitters were forced to engage with the reviewers quickly on negative feedback instead of leaving a patch sitting there indefinitely Here is the number I think we should be looking at http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt Queue growth in the last 30 days: 72 (2.4/day) http://russellbryant.net/openstack-stats/tripleo-reviewers-90.txt Queue growth in the last 90 days: 132 (1.5/day) Obviously this isn't sustainable re enabling auto abandon will ensure the majority of the patches we are looking at are current/good quality and not lost in a sea of -1's. How would people feel about turning it back on? Can it be done on a per project basis? To make the whole process a little friendlier we could increase the time frame from 1 week to 2. Derek. [1] Current Stats since the last revision without -1 or -2 : Average wait time: 10 days, 17 hours, 6 minutes 1st quartile wait time: 1 days, 1 hours, 36 minutes Median wait time: 7 days, 5 hours, 33 minutes 3rd quartile wait time: 16 days, 8 hours, 16 minutes At last week's meeting we had: 3rd quartile wait time: 15 days, 13 hours, 47 minutes A week before that: 3rd quartile wait time: 13 days, 9 hours, 11 minutes The week before that was the mid-cycle, but the week before that: 19:53:38 lifeless Stats since the last revision without -1 or -2 : 19:53:38 lifeless Average wait time: 10 days, 17 hours, 49 minutes 19:53:38 lifeless 1st quartile wait time: 4 days, 7 hours, 57 minutes 19:53:38 lifeless Median wait time: 7 days, 10 hours, 52 minutes 19:53:40 lifeless 3rd quartile wait time: 13 days, 13 hours, 25 minutes [2] Some of the things suggested as potential causes of the long 3rd median times: * We have small number of really old reviews that have only positive scores but aren't being landed * Some reviews get a -1 but then sit for a long time waiting for the author to reply * We have some really old reviews that suddenly get revived after a long period being in WIP or abandoned, which reviewstats seems to miscount * Reviewstats counts weekends, we don't (so a change that gets pushed at 5pm US Friday and gets reviewed at 9am Aus Monday would be seen by us as having no wait time, but by reviewstats as ~36 hours) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
The implementation in ceilometer is very different to the Ironic one - are you saying the test you linked fails with Ironic, or that it fails with the ceilometer code today? Disclaimer: in Ironic terms, node = conductor, key = host The test I linked fails with Ironic hash ring code (specifically the part that tests consistency). With 1000 keys being mapped to 10 nodes, when you add a node: - current ceilometer code remaps around 7% of the keys ( 1/#nodes) - Ironic code remaps 90% of the keys So just to underscore what Nejc is saying here ... The key point is the proportion of such baremetal-nodes that would end up being re-assigned when a new conductor is fired up. That was 100% clear, but thanks for making sure. The question was getting a proper understanding of why it was happening in Ironic. The ceilometer hashring implementation is good, but it uses the same terms very differently (e.g. replicas for partitions) - I'm adapting the key fix back into Ironic - I'd like to see us converge on a single implementation, and making sure the Ironic one is suitable for ceilometer seems applicable here (since ceilometer seems to need less from the API), Absolutely +1 on converging on a single implementation. That was our intent on the ceilometer side from the get-go, to promote a single implementation to oslo that both projects could share. This turned out not to be possible in the short term when the non-consistent aspect of the Ironic implementation was discovered by Nejc, with the juno-3 deadline looming. However for kilo, we would definitely be interested in leveraging a best-of-breed implementation from oslo. If reassigning was cheap Ironic wouldn't have bothered having a hash ring :) Fair enough, I was just allowing for the possibility that avoidance of needless re-mapping hasn't as high a priority on the ironic side as it was for ceilometer. Cheers, Eoghan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the reviews in a more efficient manner than plain gerrit allows for. These certainly help, but they can't ever solve the problem on their own - just make it slightly more bearable. And this is not even considering that core team members might have useful contributions to make in ways beyond just code review. Ultimately the workload is just too high to sustain the levels of review required, so core team members will eventually burn out (as they have done many times already). Even if one person attempts to take the initiative to heavily invest in review of certain features it is often to no avail. Unless a second dedicated core reviewer can be found to 'tag team' it is hard for one person to make a difference. The end result is that a patch is +2d and then sits idle for weeks or more until a merge conflict requires it to be reposted at which point even that one +2 is lost. This is a pretty demotivating outcome for both reviewers the patch contributor. New core team talent It can't escape attention that the Nova core team does not grow in size very often. When Nova was younger and its code base was smaller, it was easier for contributors to get onto core because the base level of knowledge required was that much smaller. To get onto core today requires a major investment in learning Nova over a year or more. Even people who potentially have the latent skills may not have the time available to invest in learning the entire of Nova. With the number of reviews proposed to Nova, the core team should probably be at least double its current size[1]. There is plenty of expertize in the project as a whole but it is typically focused into specific areas of the codebase. There is nowhere we can find 20 more people with broad knowledge of the codebase who could be promoted even over the next year, let alone today. This is ignoring that many existing members of core are relatively inactive due to burnout and so need replacing. That means we really need another 25-30 people for core. That's not going to happen. Code review delays -- The obvious result of having too much work for too few reviewers is that code contributors face major delays in getting their work reviewed and merged. From personal experience, during Juno, I've probably spent 1 week in aggregate on actual code development vs 8 weeks on waiting on code review. You have to constantly be on alert for review comments because unless you can respond quickly (and repost) while you still have the attention of the reviewer, they may not be look again for days/weeks. The length of time to get work merged serves as a demotivator to actually do work in the first place. I've personally avoided doing alot of code refactoring cleanup work that would improve the maintainability of the libvirt driver in the long term, because I can't face the battle to get it reviewed merged. Other people have told me much the same. It is not uncommon to see changes that have been pending for 2 dev cycles, not because the code was bad but because
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
On Thu, 4 Sep 2014, Flavio Percoco wrote: Thanks for writing this up, interesting read. 5. Ceilometer's recommended storage driver is still MongoDB, although Ceilometer has now support for sqlalchemy. (Please correct me if I'm wrong). For sake of reference: Yes, MongoDB is currently the recommended store and yes, sqlalchemy support is present. Until recently only sqlalchemy support was tested in the gate. Two big changes being developed in Juno related to storage: * Improved read and write performance in the sqlalchemy setup. * time series storage and Gnocchi: https://julien.danjou.info/blog/2014/openstack-ceilometer-the-gnocchi-experiment I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't keep avoiding these technologies. NoSQL technologies have been around for years and we should be prepared - including OpenStack operators - to support these technologies. Not every tool is good for all tasks - one of the reasons we removed the sqlalchemy driver in the first place - therefore it's impossible to keep an homogeneous environment for all services. +1. Ain't that the truth. As mentioned in the meeting on Tuesday, Zaqar is not reinventing message brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack flavor on top. [0] In my efforts to track this stuff I remain confused on the points in these two questions: https://wiki.openstack.org/wiki/Zaqar/Frequently_asked_questions#How_does_Zaqar_compare_to_oslo.messaging.3F https://wiki.openstack.org/wiki/Zaqar/Frequently_asked_questions#Is_Zaqar_an_under-cloud_or_an_over-cloud_service.3F What or where is the boundary between Zaqar and existing messaging infrastructure? Not just in terms of technology but also use cases? The answers above suggest its not super solid on the use case side, notably: In addition, several projects have expressed interest in integrating with Zaqar in order to surface events... Instead of Zaqar doing what it does and instead of oslo.messaging abstracting RPC, why isn't the end goal a multi-tenant, multi-protocol event pool? Wouldn't that have the most flexibility in terms of ecosystem and scalability? In addition to the aforementioned concerns and comments, I also would like to share an etherpad that contains some use cases that other integrated projects have for Zaqar[0]. The list is not exhaustive and it'll contain more information before the next meeting. [0] https://etherpad.openstack.org/p/zaqar-integrated-projects-use-cases For these, what is Zaqar providing that oslo.messaging (and its still extant antecedents) does not? I'm not asking to naysay Zaqar, but to understand more clearly what's going on. My interest here comes from a general interest in now events and notifications are handled throughout OpenStack. Thanks. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Fuel] Goals for 5.1.1 6.0
Thanks, Dmitry. Let's get short status on these items during Fuel Weekly Meeting today [1]. [1] https://etherpad.openstack.org/p/fuel-weekly-meeting-agenda On Wed, Sep 3, 2014 at 7:52 PM, Dmitry Pyzhov dpyz...@mirantis.com wrote: Feature blockers: Versioning https://blueprints.launchpad.net/fuel/+spec/nailgun-versioning for REST API https://blueprints.launchpad.net/fuel/+spec/nailgun-versioning-api, UI, serialization https://blueprints.launchpad.net/fuel/+spec/nailgun-versioning-rpc Ongoing activities: Nailgun plugins https://blueprints.launchpad.net/fuel/+spec/nailgun-plugins Stability and Reliability: Docs for serialization data Docs for REST API data https://blueprints.launchpad.net/fuel/+spec/documentation-on-rest-api-input-output Nailgun unit tests restructure Image based provisioning https://blueprints.launchpad.net/fuel/+spec/image-based-provisioning Granular deployment https://blueprints.launchpad.net/fuel/+spec/granular-deployment-based-on-tasks Artifact-based build system Power management Fencing https://blueprints.launchpad.net/fuel/+spec/ha-fencing Features: Advanced networking https://blueprints.launchpad.net/fuel/+spec/advanced-networking (blocked by Multi L2 support) Some of this items will not fit 6.0, I guess. But we should work on them now. On Thu, Aug 28, 2014 at 4:26 PM, Mike Scherbakov mscherba...@mirantis.com wrote: Hi Fuelers, while we are busy with last bugs which block us from releasing 5.1, we need to start thinking about upcoming releases. Some of you already started POC, some - specs, and I see discussions in ML and IRC. From overall strategy perspective, focus for 6.0 is: - OpenStack Juno release - Certificate 100-node deployment. In terms of OpenStack, if not possible for Juno, let's do for Icehouse - Send anonymous stats about deployment (deployment modes, features used, etc.) - Stability and Reliability Let's take a little break and think, in a first order, about features, sustaining items and bugs which block us from releasing whether 5.1.1 or 6.0. We have to start creating blueprints (and moving them to 6.0 milestone) and make sure there are critical bugs assigned to appropriate milestone, if there are any. Examples which come to my mind immediately: - Use service token to auth in Keystone for upgrades (affects 5.1.1), instead of plain admin login / pass. Otherwise it affects security, and user should keep password in plain text - Decrease upgrade tarball size Please come up with blueprints and LP bugs links, and short explanation why it's a blocker for upcoming releases. Thanks, -- Mike Scherbakov #mihgen ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Mike Scherbakov #mihgen ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] FFE request v2-on-v3-api
Hi, I'd like to request a FFE for 4 changesets from the v2-on-v3-api blueprint: https://review.openstack.org/#/c/113814/ https://review.openstack.org/#/c/115515/ https://review.openstack.org/#/c/115576/ https://review.openstack.org/#/c/11/ They have all already been approved and were in the gate for a while but just didn't quite make it through in time. So they shouldn't put any load on reviewers. Sponsoring cores: Kenichi Ohmichi John Garbutt Me Regards, Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 09/04/2014 11:51 AM, Robert Collins wrote: On 4 September 2014 19:53, Nejc Saje ns...@redhat.com wrote: I used the terms that are used in the original caching use-case, as described in [1] and are used in the pypi lib as well[2]. With the correct approach, there aren't actually any partitions, 'replicas' actually denotes the number of times you hash a node onto the ring. As for nodeskeys, what's your suggestion? So - we should change the Ironic terms then, I suspect (but lets check with Deva who wrote the original code where he got them from). The parameters we need to create a ring are: - how many fallback positions we use for data (currently referred to as replicas) - how many times we hash the servers hosting data into the ring (currently inferred via the hash_partition_exponent / server count) - the servers and then we probe data items as we go. The original paper isn't http://www.martinbroadhurst.com/Consistent-Hash-Ring.html - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.1879 is referenced by it, and that paper doesn't include the term replica count at all. In other systems like cassandra, replicas generally refers to how many servers end up holding a copy of the data: Martin Broadhurts paper uses replica there in quite a different sense - I much prefer the Ironic use, which says how many servers will be operating on the data: its externally relevant. It doesn't contain that term precisely, but it does talk about replicating the buckets. What about using a descriptive name for this parameter, like 'distribution_quality', where the higher the value, higher the distribution evenness (and higher memory usage)? I've no objection talking about keys, but 'node' is an API object in Ironic, so I'd rather we talk about hosts - or make it something clearly not node like 'bucket' (which the 1997 paper talks about in describing consistent hash functions). So proposal: - key - a stringifyable thing to be mapped to buckets What about using the term 'item' from the original paper as well? - bucket a worker/store that wants keys mapped to it - replicas - number of buckets a single key wants to be mapped to Can we keep this as an Ironic-internal parameter? Because it doesn't really affect the hash ring. If you want multiple buckets for your item, you just continue your journey along the ring and keep returning new buckets. Check out how the pypi lib does it: https://github.com/Doist/hash_ring/blob/master/hash_ring/ring.py#L119 - partitions - number of total divisions of the hash space (power of 2 required) I don't think there are any divisions of the hash space in the correct implementation, are there? I think that in the current Ironic implementation this tweaks the distribution quality, just like 'replicas' parameter in Ceilo implementation. Cheers, Nejc I've opened a bug[3], so you can add a Closes-Bug to your patch. Thanks! -Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
On 09/04/2014 02:07 AM, Joe Gordon wrote: On Wed, Sep 3, 2014 at 2:50 AM, Nikola Đipanov ndipa...@redhat.com mailto:ndipa...@redhat.com wrote: On 09/02/2014 09:23 PM, Michael Still wrote: On Tue, Sep 2, 2014 at 1:40 PM, Nikola Đipanov ndipa...@redhat.com mailto:ndipa...@redhat.com wrote: On 09/02/2014 08:16 PM, Michael Still wrote: Hi. We're soon to hit feature freeze, as discussed in Thierry's recent email. I'd like to outline the process for requesting a freeze exception: * your code must already be up for review * your blueprint must have an approved spec * you need three (3) sponsoring cores for an exception to be granted Can core reviewers who have features up for review have this number lowered to two (2) sponsoring cores, as they in reality then need four (4) cores (since they themselves are one (1) core but cannot really vote) making it an order of magnitude more difficult for them to hit this checkbox? That's a lot of numbers in that there paragraph. Let me re-phrase your question... Can a core sponsor an exception they themselves propose? I don't have a problem with someone doing that, but you need to remember that does reduce the number of people who have agreed to review the code for that exception. Michael has correctly picked up on a hint of snark in my email, so let me explain where I was going with that: The reason many features including my own may not make the FF is not because there was not enough buy in from the core team (let's be completely honest - I have 3+ other core members working for the same company that are by nature of things easier to convince), but because of any of the following: I find the statement about having multiple cores at the same company very concerning. To quote Mark McLoughlin, It is assumed that all core team members are wearing their upstream hat and aren't there merely to represent their employers interests [0]. Your statement appears to be in direct conflict with Mark's idea of what core reviewer is, and idea that IMHO is one of the basic tenants of OpenStack development. This is of course taking my words completely out of context - I was making a point of how arbitrary changing the number of reviewers needed is, and how it completely misses the real issues IMHO. I have no interest in continuing this particular debate further, and would appreciate if people could refraining from resorting to such straw-man type arguments, as it can be very damaging to the overall level of conversation we need to maintain. [0] http://lists.openstack.org/pipermail/openstack-dev/2013-July/012073.html * Crippling technical debt in some of the key parts of the code * that we have not been acknowledging as such for a long time * which leads to proposed code being arbitrarily delayed once it makes the glaring flaws in the underlying infra apparent * and that specs process has been completely and utterly useless in helping uncover (not that process itself is useless, it is very useful for other things) I am almost positive we can turn this rather dire situation around easily in a matter of months, but we need to start doing it! It will not happen through pinning arbitrary numbers to arbitrary processes. Nova is big and complex enough that I don't think any one person is able to identify what we need to work on to make things better. That is one of the reasons why I have the project priorities patch [1] up. I would like to see nova as a team discuss and come up with what we think we need to focus on to get us back on track. [1] https://review.openstack.org/#/c/112733/ Yes - I was thinking along similar lines as what you propose on that patch, too bad if the above sentence came across as implying I had some kind of cowboy one-man crusade in mind :) it is totally not what I meant. We need strong consensus on what is important for the project, and we need hands behind that (both hackers and reviewers). Having a good chunk of core devs not actually writing critical bits of code is a bad sign IMHO. I have some additions to your list of priorities which I will add as comments on the review above (with some other comments of my own), and we can discuss from there - sorry I missed this! I will likely do that instead of spamming further with another email as the baseline seems sufficiently similar to where I stand. I will follow up with a more detailed email about what I believe we are missing, once the FF settles and I have applied some soothing creme to my burnout wounds, but currently my sentiment is: Contributing features to Nova nowadays SUCKS!!1 (even as a core reviewer) We _have_ to change that! Yes, I can agree with you on
[openstack-dev] [Nova][FFE] Feature freeze exception for virt-driver-numa-placement
Hi team, I am requesting the exception for the feature from the subject (find specs at [1] and outstanding changes at [2]). Some reasons why we may want to grant it: First of all all patches have been approved in time and just lost the gate race. Rejecting it makes little sense really, as it has been commented on by a good chunk of the core team, most of the invasive stuff (db migrations for example) has already merged, and the few parts that may seem contentious have either been discussed and agreed upon [3], or can easily be addressed in subsequent bug fixes. It would be very beneficial to merge it so that we actually get real testing on the feature ASAP (scheduling features are not tested in the gate so we need to rely on downstream/3rd party/user testing for those). Thanks, Nikola [1] http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/juno/virt-driver-numa-placement.rst [2] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/virt-driver-numa-placement,n,z [3] https://review.openstack.org/#/c/111782/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
On Thu, Sep 04, 2014 at 09:05:57AM +, Day, Phil wrote: -Original Message- From: Nikola Đipanov [mailto:ndipa...@redhat.com] Sent: 03 September 2014 10:50 To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno snip I will follow up with a more detailed email about what I believe we are missing, once the FF settles and I have applied some soothing creme to my burnout wounds, but currently my sentiment is: Contributing features to Nova nowadays SUCKS!!1 (even as a core reviewer) We _have_ to change that! [snip] Has anyone looked at the review bandwidth issue from the perspective of whether there has been a change in the amount of time cores now spend contributing vs reviewing ? I've certainly spent more time reviewing code in the last 2 dev cycles, not least because I need something todo while waiting for my own code submissions to get reviewed merged (which feels like it is taking longer longer). Despite the huge efforts in review we're barely denting the flow and are having to get ever better at saying no to proposed features to cope. Maybe there's an opportunity to get cores to mentor non-cores to do the code production, freeing up review cycles ? As a core dev I want to feel that I'm still able to do valuable code submission myself, while also doing the important code review work. IOW, I don't want to end up with core team job requiring 100% of time to be spent on review cycles, as from my POV that ends up with little to no job satisfaction. Core needs to be able to maintain a balance between doing review and being able to scratch the itch in their own areas of coding interest. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][FFE] Feature freeze exception for virt-driver-numa-placement
On Thu, Sep 04, 2014 at 01:58:58PM +0200, Nikola Đipanov wrote: Hi team, I am requesting the exception for the feature from the subject (find specs at [1] and outstanding changes at [2]). Some reasons why we may want to grant it: First of all all patches have been approved in time and just lost the gate race. Rejecting it makes little sense really, as it has been commented on by a good chunk of the core team, most of the invasive stuff (db migrations for example) has already merged, and the few parts that may seem contentious have either been discussed and agreed upon [3], or can easily be addressed in subsequent bug fixes. It would be very beneficial to merge it so that we actually get real testing on the feature ASAP (scheduling features are not tested in the gate so we need to rely on downstream/3rd party/user testing for those). I think this NUMA work is a very important step forwards for Nova in general, whch will benefit our entire userbase of KVM deployments, and be especially useful to the NFV user group's needs. As such, I'll be one sponsor for the FFE Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
On 09/04/2014 03:36 AM, Dean Troyer wrote: On Wed, Sep 3, 2014 at 7:07 PM, Joe Gordon joe.gord...@gmail.com mailto:joe.gord...@gmail.com wrote: On Wed, Sep 3, 2014 at 2:50 AM, Nikola Đipanov ndipa...@redhat.com mailto:ndipa...@redhat.com wrote: The reason many features including my own may not make the FF is not because there was not enough buy in from the core team (let's be completely honest - I have 3+ other core members working for the same company that are by nature of things easier to convince), but because of any of the following: I find the statement about having multiple cores at the same company very concerning. To quote Mark McLoughlin, It is assumed that all core team members are wearing their upstream hat and aren't there merely to represent their employers interests [0]. Your statement appears to be in direct conflict with Mark's idea of what core reviewer is, and idea that IMHO is one of the basic tenants of OpenStack development. FWIW I read Nikola's 'by nature of things' statement to be more of a representation of the higher-bandwith communication and relationships with co-workers rather than for the company. I hope my reading is not wrong. Thanks for not reading too much into that sentence - yes, this is quite close to what I meant, and used it to make a point of how I think we are focusing on the wrong thing (as already mentioned on the direct response to Joe). N. I know a while back some of the things I was trying to land in multiple projects really benefited from having both the relationships and high-bandwidth communication to 4 PTLs, three of whom were in the same room at the time. There is the perception problem, exactly what Mark also wrote about, when that happens off-line, and I think it is our responsibility (those advocating the reviews, and those responding to them) to note the outcome of those discussions on the record somewhere, IMO preferably in Gerrit. dt -- Dean Troyer dtro...@gmail.com mailto:dtro...@gmail.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
On 09/04/2014 03:08 AM, Flavio Percoco wrote: Greetings, Last Tuesday the TC held the first graduation review for Zaqar. During the meeting some concerns arose. I've listed those concerns below with some comments hoping that it will help starting a discussion before the next meeting. In addition, I've added some comments about the project stability at the bottom and an etherpad link pointing to a list of use cases for Zaqar. # Concerns - Concern on operational burden of requiring NoSQL deploy expertise to the mix of openstack operational skills For those of you not familiar with Zaqar, it currently supports 2 nosql drivers - MongoDB and Redis - and those are the only 2 drivers it supports for now. This will require operators willing to use Zaqar to maintain a new (?) NoSQL technology in their system. Before expressing our thoughts on this matter, let me say that: 1. By removing the SQLAlchemy driver, we basically removed the chance for operators to use an already deployed OpenStack-technology 2. Zaqar won't be backed by any AMQP based messaging technology for now. Here's[0] a summary of the research the team (mostly done by Victoria) did during Juno 3. We (OpenStack) used to require Redis for the zmq matchmaker 4. We (OpenStack) also use memcached for caching and as the oslo caching lib becomes available - or a wrapper on top of dogpile.cache - Redis may be used in place of memcached in more and more deployments. 5. Ceilometer's recommended storage driver is still MongoDB, although Ceilometer has now support for sqlalchemy. (Please correct me if I'm wrong). That being said, it's obvious we already, to some extent, promote some NoSQL technologies. However, for the sake of the discussion, lets assume we don't. I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't keep avoiding these technologies. NoSQL technologies have been around for years and we should be prepared - including OpenStack operators - to support these technologies. Not every tool is good for all tasks - one of the reasons we removed the sqlalchemy driver in the first place - therefore it's impossible to keep an homogeneous environment for all services. With this, I'm not suggesting to ignore the risks and the extra burden this adds but, instead of attempting to avoid it completely by not evolving the stack of services we provide, we should probably work on defining a reasonable subset of NoSQL services we are OK with supporting. This will help making the burden smaller and it'll give operators the option to choose. [0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/ I've been one of the consistent voices concerned about a hard requirement on adding NoSQL into the mix. So I'll explain that thinking a bit more. I feel like when the TC makes an integration decision previously this has been about evaluating the project applying for integration, and if they met some specific criteria they were told about some time in the past. I think that's the wrong approach. It's a locally optimized approach that fails to ask the more interesting question. Is OpenStack better as a whole if this is a mandatory component of OpenStack? Better being defined as technically better (more features, less janky code work arounds, less unexpected behavior from the stack). Better from the sense of easier or harder to run an actual cloud by our Operators (taking into account what kinds of moving parts they are now expected to manage). Better from the sense of a better user experience in interacting with OpenStack as whole. Better from a sense that the OpenStack release will experience less bugs, less unexpected cross project interactions, an a greater overall feel of consistency so that the OpenStack API feels like one thing. https://dague.net/2014/08/26/openstack-as-layers/ One of the interesting qualities of Layers 1 2 is they all follow an AMQP + RDBMS pattern (excepting swift). You can have a very effective IaaS out of that stack. They are the things that you can provide pretty solid integration testing on (and if you look at where everything stood before the new TC mandates on testing / upgrade that was basically what was getting integration tested). (Also note, I'll accept Barbican is probably in the wrong layer, and should be a Layer 2 service.) While large shops can afford to have a dedicated team to figure out how to make mongo or redis HA, provide monitoring, have a DR plan for when a huricane requires them to flip datacenters, that basically means OpenStack heads further down the path of only for the big folks. I don't want OpenStack to be only for the big folks, I want OpenStack to be for all sized folks. I really do want to have all the local small colleges around here have OpenStack clouds, because it's something that people believe they can do and manage. I know the people that work in this places, they all come out to the LUG I run.
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Hi Daniel, Thanks for putting together such a thoughtful piece - I probably need to re-read it few times to take in everything you're saying, but a couple of thoughts that did occur to me: - I can see how this could help where a change is fully contained within a virt driver, but I wonder how many of those there really are ? Of the things that I've see go through recently nearly all also seem to touch the compute manager in someway, and a lot (like the Numa changes) also have impacts into the scheduler. Isn't it going to make it harder to get any of those changes in if they have to be co-ordinated across two or more repos ? - I think you hit the nail on the head in terms of the scope of Nova and how few people probably really understand all of it, but given the amount of trust that goes with being a core wouldn't it also be able to make people cores on the understanding that they will only approve code in the areas they are expert in ?It kind of feels that this happens to a large extent already, for example I don't see Chris or Ken'ichi taking on work outside of the API layer. It kind of feels as if given a small amount of trust we could have additional core reviewers focused on specific parts of the system without having to split up the code base if that's where the problem is. Phil -Original Message- From: Daniel P. Berrange [mailto:berra...@redhat.com] Sent: 04 September 2014 11:24 To: OpenStack Development Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the reviews in a more efficient manner than plain gerrit allows for. These certainly help, but they can't ever solve the problem on their own - just make it slightly more bearable. And this is not even considering that core team members might have useful contributions to make in ways beyond just code review. Ultimately the workload is just too high to sustain the levels of review required, so core team members will eventually burn out (as they have done many times already). Even if one person attempts to take the initiative to heavily invest in review of certain features it is often to no avail. Unless a second dedicated core reviewer can be found to 'tag team' it is hard for one person to make a difference. The end result is that a patch is +2d and then sits idle for weeks or more until a merge conflict requires it to be reposted at which point even that one +2 is lost. This is a pretty demotivating outcome for both reviewers the patch contributor. New core team talent It can't escape attention that the Nova core team does not grow in size very often. When Nova was younger and its code base was smaller, it was easier for contributors to get onto core because the base level of knowledge required was that much smaller. To get onto core today requires a major investment in learning Nova over a year or more. Even people who potentially have the latent skills may not
Re: [openstack-dev] [nova] FFE request v2-on-v3-api
On 09/04/2014 07:34 AM, Christopher Yeoh wrote: Hi, I'd like to request a FFE for 4 changesets from the v2-on-v3-api blueprint: https://review.openstack.org/#/c/113814/ https://review.openstack.org/#/c/115515/ https://review.openstack.org/#/c/115576/ https://review.openstack.org/#/c/11/ They have all already been approved and were in the gate for a while but just didn't quite make it through in time. So they shouldn't put any load on reviewers. Sponsoring cores: Kenichi Ohmichi John Garbutt Me Sign me up as a sponsor as well. I think the scope is highly constrained here, and risk to the rest of the project is low. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][FFE] Feature freeze exception for virt-driver-numa-placement
On 09/04/2014 07:58 AM, Nikola Đipanov wrote: Hi team, I am requesting the exception for the feature from the subject (find specs at [1] and outstanding changes at [2]). Some reasons why we may want to grant it: First of all all patches have been approved in time and just lost the gate race. Rejecting it makes little sense really, as it has been commented on by a good chunk of the core team, most of the invasive stuff (db migrations for example) has already merged, and the few parts that may seem contentious have either been discussed and agreed upon [3], or can easily be addressed in subsequent bug fixes. It would be very beneficial to merge it so that we actually get real testing on the feature ASAP (scheduling features are not tested in the gate so we need to rely on downstream/3rd party/user testing for those). This statement bugs me. It seems kind of backwards to say we should merge a thing that we don't have a good upstream test plan on and put it in a release so that the testing will happen only in the downstream case. Anyway, not enough to -1 it, but enough to at least say something. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] FFE request serial-ports
Hello, I would like to request a FFE for 4 changesets to complete the blueprint serial-ports. Topic on gerrit: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/serial-ports,n,z Blueprint on launchpad.net: https://blueprints.launchpad.net/nova/+spec/serial-ports They have already been approved but didn't get enough time to be merged by the gate. Sponsored by: Daniel Berrange Nikola Dipanov s. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] FFE request serial-ports
On Thu, Sep 04, 2014 at 02:42:11PM +0200, Sahid Orentino Ferdjaoui wrote: Hello, I would like to request a FFE for 4 changesets to complete the blueprint serial-ports. Topic on gerrit: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/serial-ports,n,z Blueprint on launchpad.net: https://blueprints.launchpad.net/nova/+spec/serial-ports They have already been approved but didn't get enough time to be merged by the gate. Sponsored by: Daniel Berrange Nikola Dipanov ACK, this has my blessing. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
On 09/04/2014 01:15 PM, Chris Dent wrote: On Thu, 4 Sep 2014, Flavio Percoco wrote: Thanks for writing this up, interesting read. Thank you for your feedback :) Some comments in-line. 5. Ceilometer's recommended storage driver is still MongoDB, although Ceilometer has now support for sqlalchemy. (Please correct me if I'm wrong). For sake of reference: Yes, MongoDB is currently the recommended store and yes, sqlalchemy support is present. Until recently only sqlalchemy support was tested in the gate. Two big changes being developed in Juno related to storage: * Improved read and write performance in the sqlalchemy setup. * time series storage and Gnocchi: https://julien.danjou.info/blog/2014/openstack-ceilometer-the-gnocchi-experiment Awesome, thanks for clarifying this. [snip] As mentioned in the meeting on Tuesday, Zaqar is not reinventing message brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack flavor on top. [0] In my efforts to track this stuff I remain confused on the points in these two questions: https://wiki.openstack.org/wiki/Zaqar/Frequently_asked_questions#How_does_Zaqar_compare_to_oslo.messaging.3F https://wiki.openstack.org/wiki/Zaqar/Frequently_asked_questions#Is_Zaqar_an_under-cloud_or_an_over-cloud_service.3F What or where is the boundary between Zaqar and existing messaging infrastructure? Not just in terms of technology but also use cases? The answers above suggest its not super solid on the use case side, notably: In addition, several projects have expressed interest in integrating with Zaqar in order to surface events... Instead of Zaqar doing what it does and instead of oslo.messaging abstracting RPC, why isn't the end goal a multi-tenant, multi-protocol event pool? Wouldn't that have the most flexibility in terms of ecosystem and scalability? If we put both features, multi-tenancy and multi-protocol, aside for a bit, we can simplify Zaqars goal down to a messaging service for the cloud. I believe this is exactly where the line between Zaqar and other *queuing* technologies should be drawn. Zaqar is, at the very end, a messaging service thought for the cloud whereas existing queuing technologies were not designed for it. By cloud I don't mean performance, scalability nor anything like that. I'm talking about providing a service that end-users of the cloud can consume. The fact that Zaqar is also ideal for the under-cloud is a plus. The service has been designed to suffice a set of messaging features that serve perfectly use-cases in both the under-cloud and over-cloud. If we add to that a multi-protocol transport layer with support for multi-tenancy, you'll get a queuing service that fits the need of cloud providers and covers a broader set of use cases like, say, IoT. I forgot to add this link[0] to my previous email. Does the overview of the service, the key features and scope help clearing things out a bit? Please, let me know if they don't. I'm happy to provide more info if needed. [0] https://wiki.openstack.org/wiki/Zaqar#Overview In addition to the aforementioned concerns and comments, I also would like to share an etherpad that contains some use cases that other integrated projects have for Zaqar[0]. The list is not exhaustive and it'll contain more information before the next meeting. [0] https://etherpad.openstack.org/p/zaqar-integrated-projects-use-cases For these, what is Zaqar providing that oslo.messaging (and its still extant antecedents) does not? I'm not asking to naysay Zaqar, but to understand more clearly what's going on. My interest here comes from a general interest in now events and notifications are handled throughout OpenStack. One of the reasons you would want to use Zaqar instead of oslo.messaging for, say, guest-agents is that you don't want guest-agents to talk to your main messaging layer. Zaqar will help guest-agents to communicate with the main service in a more secure, authenticated and isolated way. If you were going to do that with oslo.messaging, you'd need to have separate virtual_hosts, exchanges and probably even users. This things cannot be easily configured without manual intervention. With Zaqar you can easily rely on your deployed cloud services - keystone, Barbican and Zaqar, for example - to achieve such isolation and security. There are also other aspects that are worrisome of relying on the main messaging infrastructure for the use cases mentioned in that etherpad. For example, using OpenStack's main rabbitmq instance to communicate with guest-agents would increase the workload on the infrastructure, which would require a better scaling strategy for it. I hope the above clears your doubts. Thanks a lot for your feedback, it's useful to keep the discussion going and helps everyone to keep re-evaluating the goals and scopes of the project. I hope other folks from the team will also chime in and share their thoughts. Cheers, Flavio --
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 12:14:39PM +, Day, Phil wrote: Hi Daniel, Thanks for putting together such a thoughtful piece - I probably need to re-read it few times to take in everything you're saying, but a couple of thoughts that did occur to me: - I can see how this could help where a change is fully contained within a virt driver, but I wonder how many of those there really are ? Of the things that I've see go through recently nearly all also seem to touch the compute manager in someway, and a lot (like the Numa changes) also have impacts into the scheduler. Isn't it going to make it harder to get any of those changes in if they have to be co-ordinated across two or more repos ? Actually, in my experiance of reviewing code this past cycle or two I see a fairly significant portion of code that is entirely within the scope of a virt driver. I'm also seeing that people are refraining from actually doing changes to the virt drivers because of the burden of getting code past review, so what we see today is probably not even representative of the potential. There are certainly some high profile exceptions such as the NUMA work, or the new serial console work where you're going to cross the repos. In such work we already try to break patches into isolated pieces, so the stuff touching common code is a separate commit from the stuff touching virt code. This is general good practice to be encouraging. So, yes, it would need coordination across the repos to get the full work submitted, but I don't think that burden is unduly large compared to current practice. We do in fact already see this need for co-ordination in other ways, For example, API changes have parts that affect python-novaclient, and perhaps horizon too. Storage network changes often cross Neutron / Cinder and Nova. If we can reduce the burden on nova-core the stuff going into common codebase shoudl stand more chance of getting review too. So overall yes, this is a valid point, but I'm not particularly concerned about the negatives impacts of it, because we're already dealing with them today to a large extent. - I think you hit the nail on the head in terms of the scope of Nova and how few people probably really understand all of it, but given the amount of trust that goes with being a core wouldn't it also be able to make people cores on the understanding that they will only approve code in the areas they are expert in ? It kind of feels that this happens to a large extent already, for example I don't see Chris or Ken'ichi taking on work outside of the API layer.It kind of feels as if given a small amount of trust we could have additional core reviewers focused on specific parts of the system without having to split up the code base if that's where the problem is. Yes, you are right that it happens to some extent but I think it is quite a big jump to effectively scale it up that amount of trust to a team that realistically would need to be 40+ people in size. Also this isn't soley about review bandwidth. One of the things I raised was about how there's certain standards required for being part of nova, such as CI testing. If you can't meet that you're forced into a sub-optimal development practice compared to the rest of nova where you are out of tree at subject to be broken by Nova changes at any time, which is what Docker and Ironic have been facing. Separate repos will also facilitate more targetted application of our testing resources, so vmware repo changes wouldn't need to suffer false failures from libvirt tempest jobs, and similarly vmware CI could be made gating for vmware without causing libvirt code to suffer instability. -Original Message- From: Daniel P. Berrange [mailto:berra...@redhat.com] Sent: 04 September 2014 11:24 To: OpenStack Development Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Like I mentioned before, I think the only way out of the Nova death spiral is to split code and give control over it to smaller dedicated review teams. This is one way to do it. Thanks Dan for pulling this together :) A couple comments inline: Daniel P. Berrange wrote: [...] This is a crisis. A large crisis. In fact, if you got a moment, it's a twelve-storey crisis with a magnificent entrance hall, carpeting throughout, 24-hour portage, and an enormous sign on the roof, saying 'This Is a Large Crisis'. A large crisis requires a large plan. [...] I totally agree. We need a plan now, because we can't go through another cycle without a solution in sight. [...] This has quite a few implications for the way development would operate. - The Nova core team at least, would be voluntarily giving up a big amount of responsibility over the evolution of virt drivers. Due to human nature, people are not good at giving up power, so this may be painful to swallow. Realistically current nova core are not experts in most of the virt drivers to start with, and more important we clearly do not have sufficient time to do a good job of review with everything submitted. Much of the current need for core review of virt drivers is to prevent the mis-use of a poorly defined virt driver API...which can be mitigated - See later point(s) - Nova core would/should not have automatic +2 over the virt driver repositories since it is unreasonable to assume they have the suitable domain knowledge for all virt drivers out there. People would of course be able to be members of multiple core teams. For example John G would naturally be nova-core and nova-xen-core. I would aim for nova-core and nova-libvirt-core, and so on. I do not want any +2 responsibility over VMWare/HyperV/Docker drivers since they're not my area of expertize - I only look at them today because they have no other nova-core representation. - Not sure if it implies the Nova PTL would be solely focused on Nova common. eg would there continue to be one PTL over all virt driver implementation projects, or would each project have its own PTL. Maybe this is irrelevant if a Czars approach is chosen by virt driver projects for their work. I'd be inclined to say that a single PTL should stay as a figurehead to represent all the virt driver projects, acting as a point of contact to ensure we keep communication / co-operation between the drivers in sync. [...] At this point it may look like our current structure (programs, one PTL, single core teams...) prevents us from implementing that solution. I just want to say that in OpenStack, organizational structure reflects how we work, not the other way around. If we need to reorganize official project structure to work in smarter and long-term healthy ways, that's a really small price to pay. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] FFE request serial-ports
On 09/04/2014 01:42 PM, Sahid Orentino Ferdjaoui wrote: Hello, I would like to request a FFE for 4 changesets to complete the blueprint serial-ports. Topic on gerrit: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/serial-ports,n,z Blueprint on launchpad.net: https://blueprints.launchpad.net/nova/+spec/serial-ports They have already been approved but didn't get enough time to be merged by the gate. Sponsored by: Daniel Berrange Nikola Dipanov I'll sponsor this too, as I originally reviewed the set and approved ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
2014-09-03 20:31 GMT+09:00 Gary Kotton gkot...@vmware.com: On 9/3/14, 12:50 PM, Nikola Đipanov ndipa...@redhat.com wrote: On 09/02/2014 09:23 PM, Michael Still wrote: On Tue, Sep 2, 2014 at 1:40 PM, Nikola Đipanov ndipa...@redhat.com wrote: On 09/02/2014 08:16 PM, Michael Still wrote: Hi. We're soon to hit feature freeze, as discussed in Thierry's recent email. I'd like to outline the process for requesting a freeze exception: * your code must already be up for review * your blueprint must have an approved spec * you need three (3) sponsoring cores for an exception to be granted Can core reviewers who have features up for review have this number lowered to two (2) sponsoring cores, as they in reality then need four (4) cores (since they themselves are one (1) core but cannot really vote) making it an order of magnitude more difficult for them to hit this checkbox? That's a lot of numbers in that there paragraph. Let me re-phrase your question... Can a core sponsor an exception they themselves propose? I don't have a problem with someone doing that, but you need to remember that does reduce the number of people who have agreed to review the code for that exception. Michael has correctly picked up on a hint of snark in my email, so let me explain where I was going with that: The reason many features including my own may not make the FF is not because there was not enough buy in from the core team (let's be completely honest - I have 3+ other core members working for the same company that are by nature of things easier to convince), but because of any of the following: * Crippling technical debt in some of the key parts of the code * that we have not been acknowledging as such for a long time * which leads to proposed code being arbitrarily delayed once it makes the glaring flaws in the underlying infra apparent * and that specs process has been completely and utterly useless in helping uncover (not that process itself is useless, it is very useful for other things) I am almost positive we can turn this rather dire situation around easily in a matter of months, but we need to start doing it! It will not happen through pinning arbitrary numbers to arbitrary processes. I will follow up with a more detailed email about what I believe we are missing, once the FF settles and I have applied some soothing creme to my burnout wounds, but currently my sentiment is: Contributing features to Nova nowadays SUCKS!!1 (even as a core reviewer) We _have_ to change that! +1 Sadly what you have written above is true. The current process does not encourage new developers in Nova. I really think that we need to work on improving our community. I really think that maybe we should sit as a community at the summit and talk about this. That is important point. I also have the similar feeling to many people said. I have a patch series which has started since 2013-03-22, and some patches were not merged in Juno-3 again because of the review bandwidth. When I started this work as one new contributor, I could not imagine I needed much time for it. After that, through code reviews, sometimes I feel unbalance between each patch. Some patches are very easy like fixing typo, removing unused method. On the other hand, some patches are very difficult like some frameworks which affect long-living features. However, we are requiring two +2s for all patches. Then, easy patches also need much time for reviewing. I think most new contributors post easy patches as the first step, but they might feel frustrations now. I think the number of the merged good patches is more important than the number of code reviews. Cannot we consider a single +2 for merging patches case by case? Thanks IKen'ichi Ohmichi ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
On 09/04/2014 02:14 PM, Sean Dague wrote: On 09/04/2014 03:08 AM, Flavio Percoco wrote: Greetings, Last Tuesday the TC held the first graduation review for Zaqar. During the meeting some concerns arose. I've listed those concerns below with some comments hoping that it will help starting a discussion before the next meeting. In addition, I've added some comments about the project stability at the bottom and an etherpad link pointing to a list of use cases for Zaqar. # Concerns - Concern on operational burden of requiring NoSQL deploy expertise to the mix of openstack operational skills For those of you not familiar with Zaqar, it currently supports 2 nosql drivers - MongoDB and Redis - and those are the only 2 drivers it supports for now. This will require operators willing to use Zaqar to maintain a new (?) NoSQL technology in their system. Before expressing our thoughts on this matter, let me say that: 1. By removing the SQLAlchemy driver, we basically removed the chance for operators to use an already deployed OpenStack-technology 2. Zaqar won't be backed by any AMQP based messaging technology for now. Here's[0] a summary of the research the team (mostly done by Victoria) did during Juno 3. We (OpenStack) used to require Redis for the zmq matchmaker 4. We (OpenStack) also use memcached for caching and as the oslo caching lib becomes available - or a wrapper on top of dogpile.cache - Redis may be used in place of memcached in more and more deployments. 5. Ceilometer's recommended storage driver is still MongoDB, although Ceilometer has now support for sqlalchemy. (Please correct me if I'm wrong). That being said, it's obvious we already, to some extent, promote some NoSQL technologies. However, for the sake of the discussion, lets assume we don't. I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't keep avoiding these technologies. NoSQL technologies have been around for years and we should be prepared - including OpenStack operators - to support these technologies. Not every tool is good for all tasks - one of the reasons we removed the sqlalchemy driver in the first place - therefore it's impossible to keep an homogeneous environment for all services. With this, I'm not suggesting to ignore the risks and the extra burden this adds but, instead of attempting to avoid it completely by not evolving the stack of services we provide, we should probably work on defining a reasonable subset of NoSQL services we are OK with supporting. This will help making the burden smaller and it'll give operators the option to choose. [0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/ I've been one of the consistent voices concerned about a hard requirement on adding NoSQL into the mix. So I'll explain that thinking a bit more. I feel like when the TC makes an integration decision previously this has been about evaluating the project applying for integration, and if they met some specific criteria they were told about some time in the past. I think that's the wrong approach. It's a locally optimized approach that fails to ask the more interesting question. Is OpenStack better as a whole if this is a mandatory component of OpenStack? Better being defined as technically better (more features, less janky code work arounds, less unexpected behavior from the stack). Better from the sense of easier or harder to run an actual cloud by our Operators (taking into account what kinds of moving parts they are now expected to manage). Better from the sense of a better user experience in interacting with OpenStack as whole. Better from a sense that the OpenStack release will experience less bugs, less unexpected cross project interactions, an a greater overall feel of consistency so that the OpenStack API feels like one thing. https://dague.net/2014/08/26/openstack-as-layers/ One of the interesting qualities of Layers 1 2 is they all follow an AMQP + RDBMS pattern (excepting swift). You can have a very effective IaaS out of that stack. They are the things that you can provide pretty solid integration testing on (and if you look at where everything stood before the new TC mandates on testing / upgrade that was basically what was getting integration tested). (Also note, I'll accept Barbican is probably in the wrong layer, and should be a Layer 2 service.) While large shops can afford to have a dedicated team to figure out how to make mongo or redis HA, provide monitoring, have a DR plan for when a huricane requires them to flip datacenters, that basically means OpenStack heads further down the path of only for the big folks. I don't want OpenStack to be only for the big folks, I want OpenStack to be for all sized folks. I really do want to have all the local small colleges around here have OpenStack clouds, because it's something that people believe they can do and manage. I know the
Re: [openstack-dev] [Nova][FFE] Feature freeze exception for virt-driver-numa-placement
On 09/04/2014 12:58 PM, Nikola Đipanov wrote: Hi team, I am requesting the exception for the feature from the subject (find specs at [1] and outstanding changes at [2]). Some reasons why we may want to grant it: First of all all patches have been approved in time and just lost the gate race. Rejecting it makes little sense really, as it has been commented on by a good chunk of the core team, most of the invasive stuff (db migrations for example) has already merged, and the few parts that may seem contentious have either been discussed and agreed upon [3], or can easily be addressed in subsequent bug fixes. It would be very beneficial to merge it so that we actually get real testing on the feature ASAP (scheduling features are not tested in the gate so we need to rely on downstream/3rd party/user testing for those). Thanks, Nikola [1] http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/juno/virt-driver-numa-placement.rst [2] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/virt-driver-numa-placement,n,z [3] https://review.openstack.org/#/c/111782/ I'll sponsor this too, and I've already reviewed this set a few times ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][FFE] Feature freeze exception for virt-driver-numa-placement
On 09/04/2014 02:31 PM, Sean Dague wrote: On 09/04/2014 07:58 AM, Nikola Đipanov wrote: Hi team, I am requesting the exception for the feature from the subject (find specs at [1] and outstanding changes at [2]). Some reasons why we may want to grant it: First of all all patches have been approved in time and just lost the gate race. Rejecting it makes little sense really, as it has been commented on by a good chunk of the core team, most of the invasive stuff (db migrations for example) has already merged, and the few parts that may seem contentious have either been discussed and agreed upon [3], or can easily be addressed in subsequent bug fixes. It would be very beneficial to merge it so that we actually get real testing on the feature ASAP (scheduling features are not tested in the gate so we need to rely on downstream/3rd party/user testing for those). This statement bugs me. It seems kind of backwards to say we should merge a thing that we don't have a good upstream test plan on and put it in a release so that the testing will happen only in the downstream case. The objective reality is that many other things have not had upstream testing for a long time (anything that requires more than 1 compute node in Nova for example, and any scheduling feature - as I mention clearly above), so not sure how that is backwards from any reasonable point. Thanks to folks using them, it is still kept working and bugs get fixed. Getting features into the hands of users is extremely important... Anyway, not enough to -1 it, but enough to at least say something. .. but I do not want to get into the discussion about software testing here, not the place really. However, I do think it is very harmful to respond to FFE request with such blanket statements and generalizations, if only for the message it sends to the contributors (that we really care more about upholding our own myths as a community than users and features). N. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Kilo Cycle Goals Exercise
On 09/03/2014 11:37 AM, Joe Gordon wrote: As you all know, there has recently been several very active discussions around how to improve assorted aspects of our development process. One idea that was brought up is to come up with a list of cycle goals/project priorities for Kilo [0]. To that end, I would like to propose an exercise as discussed in the TC meeting yesterday [1]: Have anyone interested (especially TC members) come up with a list of what they think the project wide Kilo cycle goals should be and post them on this thread by end of day Wednesday, September 10th. After which time we can begin discussing the results. The goal of this exercise is to help us see if our individual world views align with the greater community, and to get the ball rolling on a larger discussion of where as a project we should be focusing more time. best, Joe Gordon [0] http://lists.openstack.org/pipermail/openstack-dev/2014-August/041929.html [1] http://eavesdrop.openstack.org/meetings/tc/2014/tc.2014-09-02-20.04.log.html Here is my top 5 list: 1. Functional Testing in Integrated projects The justification for this is here - http://lists.openstack.org/pipermail/openstack-dev/2014-July/041057.html. We need projects to take more ownership of their functional testing so that by the time we get to integration testing we're not exposing really fundamental bugs like being unable to handle 2 requests at the same time. For Kilo: I think we can and should be able to make progress on this on all integrated projects, as well as the python clients (which are basically untested and often very broken). 2. Consistency in southbound interfaces (Logging first) Logging and notifications are south bound interfaces from OpenStack providing information to people, or machines, about what is going on. There is also a 3rd proposed south bound with osprofiler. For Kilo: I think it's reasonable to complete the logging standards and implement them. I expect notifications (which haven't quite kicked off) are going to take 2 cycles. I'd honestly *really* love to see a unification path for all the the southbound parts, logging, osprofiler, notifications, because there is quite a bit of overlap in the instrumentation/annotation inside the main code for all of these. 3. API micro version path forward We have Cinder v2, Glance v2, Keystone v3. We've had them for a long time. When we started Juno cycle Nova used *none* of them. And with good reason, as the path forward was actually pretty bumpy. Nova has been trying to create a v3 for 3 cycles, and that effort collapsed under it's own weight. I think major API revisions in OpenStack are not actually possible any more, as there is too much intertia on existing interfaces. How to sanely and gradually evolve the OpenStack API is tremendously important, especially as a bunch of new projects are popping up that implement parts of it. We have the beginnings of a plan here in Nova, which now just needs a bunch of heavy lifting. For Kilo: A working microversion stack in at least one OpenStack service. Nova is probably closest, though Mark McClain wants to also take a spin on this in Neutron. I think if we could come up with a model that worked in both of those projects, we'd pick up some steam in making this long term approach across all of OpenStack. 4. Post merge testing As explained here - http://lists.openstack.org/pipermail/openstack-dev/2014-July/041057.html we could probably get a lot more bang for our buck if we had a smaller # of integration configurations in the pre merge gate, and a much more expansive set of post merge jobs. For Kilo: I think this could be implemented, it probably needs more hands than it has right now. 5. Consistent OpenStack python SDK / clients I think the client projects being inside the server programs has not served us well, especially as the # of servers has expanded. We as a project need to figure out how to get the SDK / unified client effort moving forward faster. For Kilo: I'm not sure how close to done we could take this, but this needs to become a larger overall push for the project as a whole, as I think our use exposed interface here is inhibiting adoption. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] FFE request serial-ports
On 09/04/2014 02:42 PM, Sahid Orentino Ferdjaoui wrote: Hello, I would like to request a FFE for 4 changesets to complete the blueprint serial-ports. Topic on gerrit: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/serial-ports,n,z Blueprint on launchpad.net: https://blueprints.launchpad.net/nova/+spec/serial-ports They have already been approved but didn't get enough time to be merged by the gate. Sponsored by: Daniel Berrange Nikola Dipanov This is also one of the ones that simply lost the gate race in the end, and I've reviewed several iterations of it, so +1 from me. N. s. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Status of Neutron at Juno-3
On Thu, Sep 4, 2014 at 3:38 AM, Miguel Angel Ajo Pelayo mangel...@redhat.com wrote: I didn't know that we could ask for FFE, so I'd like to ask (if yet in time) for: https://blueprints.launchpad.net/neutron/+spec/agent-child-processes-status https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:bp/agent-child-processes-status,n,z To get the ProcessMonitor implemented in the l3_agent and dhcp_agent at least. I believe the work is ready (I need to check the radvd respawn in the l3 agent). The ProcessMonitor class is already merged. The two remaining patches for this BP are about 65 and 200 LOC, so this is a relatively small change. In addition, since the initial patches merged in Juno-3, adding the code to monitor and restart the agents in the next two patches makes some sense. I'll add this to the list of BPs to discuss with ttx tomorrow. Thanks, Kyle Best regards, Miguel Ángel. - Original Message - On Wed, Sep 3, 2014 at 10:19 AM, Mark McClain m...@mcclain.xyz wrote: On Sep 3, 2014, at 11:04 AM, Brian Haley brian.ha...@hp.com wrote: On 09/03/2014 08:17 AM, Kyle Mestery wrote: Given how deep the merge queue is (146 currently), we've effectively reached feature freeze in Neutron now (likely other projects as well). So this morning I'm going to go through and remove BPs from Juno which did not make the merge window. I'll also be putting temporary -2s in the patches to ensure they don't slip in as well. I'm looking at FFEs for the high priority items which are close but didn't quite make it: https://blueprints.launchpad.net/neutron/+spec/l3-high-availability https://blueprints.launchpad.net/neutron/+spec/add-ipset-to-security https://blueprints.launchpad.net/neutron/+spec/security-group-rules-for-devices-rpc-call-refactor I guess I'll be the first to ask for an exception for a Medium since the code was originally completed in Icehouse: https://blueprints.launchpad.net/neutron/+spec/l3-metering-mgnt-ext The neutronclient-side code was committed in January, and the neutron side, https://review.openstack.org/#/c/70090 has had mostly positive reviews since then. I've really just spent the last week re-basing it as things moved along. +1 for FFE. I think this is good community that fell through the cracks. I agree, and I've marked it as RC1 now. I'll sort through these with ttx on Friday and get more clarity on it's official status. Thanks, Kyle mark ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [vmware][nova][FFE] vmware-spawn-refactor
I'd like to request a FFE for the remaining changes from vmware-spawn-refactor. They are: https://review.openstack.org/#/c/109754/ https://review.openstack.org/#/c/109755/ https://review.openstack.org/#/c/114817/ https://review.openstack.org/#/c/117467/ https://review.openstack.org/#/c/117283/ https://review.openstack.org/#/c/98322/ All but the last had +A, and were in the gate at the time it was closed. The last had not yet been approved, but is ready for core review. It has recently had some orthogonal changes split out to simplify it considerably. It is largely a code motion patch, and has been given +1 by VMware CI multiple times. Matt -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) GPG ID: D33C3490 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] FFE request v2-on-v3-api
2014-09-04 20:34 GMT+09:00 Christopher Yeoh cbky...@gmail.com: Hi, I'd like to request a FFE for 4 changesets from the v2-on-v3-api blueprint: https://review.openstack.org/#/c/113814/ https://review.openstack.org/#/c/115515/ https://review.openstack.org/#/c/115576/ https://review.openstack.org/#/c/11/ They have all already been approved and were in the gate for a while but just didn't quite make it through in time. So they shouldn't put any load on reviewers. Sponsoring cores: Kenichi Ohmichi John Garbutt Me Yeah, I am happy to support this work. Thanks Ken'ichi Ohmichi ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract (CADF)
Yesterday, we had a great conversation with Matt Rutkowski from IBM, one of the authors of the CADF spec. I was having a disconnect on what CADF offers and got it clarified. My assumption was CADF was a set of transformation/extraction rules for taking data from existing data structures and defining them as well-known things. For example, CADF needs to know who sent this notification. I thought CADF would give us a means to point at an existing data structure and say that's where you find it. But I was wrong. CADF is a full-on schema/data structure of its own. It would be a fork-lift replacement for our existing notifications. However, if your service hasn't really adopted notifications yet (green field) or you can handle a fork-lift replacement, CADF is a good option. There are a few gotcha's though. If you have required data that is outside of the CADF spec, it would need to go in the attachment section of the notification and that still needs a separate schema to define it. Matt's team is very receptive to extending the spec to include these special cases though. Anyway, I've written up all the options (as I see them) [1] with the advantages/disadvantages of each approach. It's just a strawman, so bend/spindle/mutilate. Look forward to feedback! -S [1] https://wiki.openstack.org/wiki/NotificationsAndCADF On 9/3/2014 12:30 PM, Sandy Walsh wrote: On 9/3/2014 11:32 AM, Chris Dent wrote: On Wed, 3 Sep 2014, Sandy Walsh wrote: We're chatting with IBM about CADF and getting down to specifics on their applicability to notifications. Once I get StackTach.v3 into production I'm keen to get started on revisiting the notification format and olso.messaging support for notifications. Perhaps a hangout for those keenly interested in doing something about this? That seems like a good idea. I'd like to be a part of that. Unfortunately I won't be at summit but would like to contribute what I can before and after. I took some notes on this a few weeks ago and extracted what seemed to be the two main threads or ideas the were revealed by the conversation that happened in this thread: * At the micro level have versioned schema for notifications such that one end can declare I am sending version X of notification foo.bar.Y and the other end can effectively deal. Yes, that's table-stakes I think. Putting structure around the payload section. Beyond type and version we should be able to attach meta information like public/private visibility and perhaps hints for external mapping (this trait - that trait in CADF, for example). * At the macro level standardize a packaging or envelope of all notifications so that they can be consumed by very similar code. That is: constrain the notifications in some way so we can also constrain the consumer code. That's the intention of what we have now. The top level traits are standard, the payload is open. We really only require: message_id, timestamp and event_type. For auditing we need to cover Who, What, When, Where, Why, OnWhat, OnWhere, FromWhere. These ideas serve two different purposes: One is to ensure that existing notification use cases are satisfied with robustness and provide a contract between two endpoints. The other is to allow a fecund notification environment that allows and enables many participants. Good goals. When Producer and Consumer know what to expect, things are good ... I know to find the Instance ID here. When the consumer wants to deal with a notification as a generic object, things get tricky (find the instance ID in the payload, What is the image type?, Is this an error notification?) Basically, how do we define the principle artifacts for each service and grant the consumer easy/consistent access to them? (like the 7-W's above) I'd really like to find a way to solve that problem. Is that a good summary? What did I leave out or get wrong? Great start! Let's keep it simple and do-able. We should also review the oslo.messaging notification api ... I've got some concerns we've lost our way there. -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][FFE] Feature freeze exception for virt-driver-numa-placement
On Thu, Sep 04, 2014 at 03:07:24PM +0200, Nikola Đipanov wrote: On 09/04/2014 02:31 PM, Sean Dague wrote: On 09/04/2014 07:58 AM, Nikola Đipanov wrote: Hi team, I am requesting the exception for the feature from the subject (find specs at [1] and outstanding changes at [2]). Some reasons why we may want to grant it: First of all all patches have been approved in time and just lost the gate race. Rejecting it makes little sense really, as it has been commented on by a good chunk of the core team, most of the invasive stuff (db migrations for example) has already merged, and the few parts that may seem contentious have either been discussed and agreed upon [3], or can easily be addressed in subsequent bug fixes. It would be very beneficial to merge it so that we actually get real testing on the feature ASAP (scheduling features are not tested in the gate so we need to rely on downstream/3rd party/user testing for those). This statement bugs me. It seems kind of backwards to say we should merge a thing that we don't have a good upstream test plan on and put it in a release so that the testing will happen only in the downstream case. The objective reality is that many other things have not had upstream testing for a long time (anything that requires more than 1 compute node in Nova for example, and any scheduling feature - as I mention clearly above), so not sure how that is backwards from any reasonable point. More critically with NUMA feature, AFAIK, there is no public cloud in existance which exposes NUMA to the guest. So unless someone is willing to pay for 100's of bare metal servers to run tempest on, I don't know of any infrastructure on which we can test NUMA today. Of course once we include NUMA features in Nova and release Nova, then the Rackspace and/or HP clouds will be in a position to start considering how when they might expose NUMA features for instances they host. So by including it in Nova today, we would be helping move towards a future where we will be able to run tempest against NUMA features. Blocking NUMA from Nova for lack of automated testing will leave us trapped in a chicken and egg scenario, potentially forever. That's not in anyones best interests IMHO Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] [feature freeze exception] FFE for libvirt-start-lxc-from-block-devices
Hello, I would like to ask for an extension for libvirt-start-lxc-from-block-devices feature. It has been previously pushed from Ice house to Juno. The spec [1] has been approved. One of the patches is a bug fix. Another patch has been already approved and failed in the gate. All patches has a +2 from Daniel Berrange. The list of the remaining patches are in [2]. [1] https://review.openstack.org/#/c/88062 [2] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/libvirt-start-lxc-from-block-devices,n,z Thank you, Vladik ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [feature freeze exception] FFE for libvirt-start-lxc-from-block-devices
On Thu, Sep 04, 2014 at 03:22:14PM +0200, Vladik Romanovsky wrote: Hello, I would like to ask for an extension for libvirt-start-lxc-from-block-devices feature. It has been previously pushed from Ice house to Juno. The spec [1] has been approved. One of the patches is a bug fix. Another patch has been already approved and failed in the gate. All patches has a +2 from Daniel Berrange. The list of the remaining patches are in [2]. [1] https://review.openstack.org/#/c/88062 [2] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/libvirt-start-lxc-from-block-devices,n,z The first two patches there are really both just bug fixes, so should not be -2'd at all right now. The last patch is sufficiently trivial that I'm happy to sponsor FFE. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
On 9/4/2014 4:21 AM, Day, Phil wrote: One final note: the specs referenced above didn't get approved until Spec Freeze, which seemed to leave me with less time to implement things. In fact, it seemed that a lot of specs didn't get approved until spec freeze. Perhaps if we had more staggered approval of specs, we'd have more staggered submission of patches, and thus less of a sudden influx of patches in the couple weeks before feature proposal freeze. Yeah I think the specs were getting approved too late into the cycle, I was actually surprised at how far out the schedules were going in allowing things in and then allowing exceptions after that. Hopefully the ideas around priorities/slots/runways will help stagger some of this also. I think there is a problem with the pattern that seemed to emerge in June where the J.1 period was taken up with spec review (a lot of good reviews happened early in that period, but the approvals kind of came in a lump at the end) meaning that the implementation work itself only seemed to really kick in during J.2 - and not surprisingly given the complexity of some of the changes ran late into J.3. We also has previously noted didn’t do any prioritization between those specs that were approved - so it was always going to be a race to who managed to get code up for review first. It kind of feels to me as if the ideal model would be if we were doing spec review for K now (i.e during the FF / stabilization period) so that we hit Paris with a lot of the input already registered and a clear idea of the range of things folks want to do.We shouldn't really have to ask for session suggestions for the summit - they should be something that can be extracted from the proposed specs (maybe we do voting across the specs or something like that).In that way the summit would be able to confirm the list of specs for K and the priority order. With the current state of the review queue maybe we can’t quite hit this pattern for K, but would be worth aspiring to for I ? Phil ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I like the idea of having our ducks somewhat in a row for the summit so we can hash out details in design sessions on high-priority specs and reserve time for figuring out what the priorities are. I think that would go a long way in fixing some of the frustrations in the other thread about the mid-cycle meetups being the place where blueprint issues are hashed out rather than the summit, and the design sessions at the summit not feeling productive. But as noted, there is also a feeling right now of focusing on Juno to get that out the door before anyone starts getting distracted with reviewing Kilo specs. And I suppose once Juno is finished no one is going to want to talk about Kilo for awhile due to burnout. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][FFE] Feature freeze exception for virt-driver-numa-placement
On 09/04/2014 09:21 AM, Daniel P. Berrange wrote: On Thu, Sep 04, 2014 at 03:07:24PM +0200, Nikola Đipanov wrote: On 09/04/2014 02:31 PM, Sean Dague wrote: On 09/04/2014 07:58 AM, Nikola Đipanov wrote: Hi team, I am requesting the exception for the feature from the subject (find specs at [1] and outstanding changes at [2]). Some reasons why we may want to grant it: First of all all patches have been approved in time and just lost the gate race. Rejecting it makes little sense really, as it has been commented on by a good chunk of the core team, most of the invasive stuff (db migrations for example) has already merged, and the few parts that may seem contentious have either been discussed and agreed upon [3], or can easily be addressed in subsequent bug fixes. It would be very beneficial to merge it so that we actually get real testing on the feature ASAP (scheduling features are not tested in the gate so we need to rely on downstream/3rd party/user testing for those). This statement bugs me. It seems kind of backwards to say we should merge a thing that we don't have a good upstream test plan on and put it in a release so that the testing will happen only in the downstream case. The objective reality is that many other things have not had upstream testing for a long time (anything that requires more than 1 compute node in Nova for example, and any scheduling feature - as I mention clearly above), so not sure how that is backwards from any reasonable point. More critically with NUMA feature, AFAIK, there is no public cloud in existance which exposes NUMA to the guest. So unless someone is willing to pay for 100's of bare metal servers to run tempest on, I don't know of any infrastructure on which we can test NUMA today. Of course once we include NUMA features in Nova and release Nova, then the Rackspace and/or HP clouds will be in a position to start considering how when they might expose NUMA features for instances they host. So by including it in Nova today, we would be helping move towards a future where we will be able to run tempest against NUMA features. Blocking NUMA from Nova for lack of automated testing will leave us trapped in a chicken and egg scenario, potentially forever. That's not in anyones best interests IMHO The spec specifically calls out the scheduler piece being the part that probably most needs to be tested, especially at large scales here. Those pieces don't need Tempest to test them, they need more solid functional tests around the scheduler under those circumstances. There are interesting (and not all that difficult) ways to do this given the resources we have, which don't seem to be being explored, which is my concern. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Nova][FFE] v3-api-schema
Hi I'd like to request FFE for patches of v3-api-schema. The list is the following: https://review.openstack.org/#/c/67428/ https://review.openstack.org/#/c/103437/ https://review.openstack.org/#/c/103436/ https://review.openstack.org/#/c/66783/ The one of them has already approved, but it stops merging with temporary -2. The other ones have gotten one +2 on each PS. This work will make v2.1 API strong. Thanks Ken'ichi Ohmichi ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][FFE] v3-api-schema
On 09/04/2014 09:30 AM, Ken'ichi Ohmichi wrote: Hi I'd like to request FFE for patches of v3-api-schema. The list is the following: https://review.openstack.org/#/c/67428/ https://review.openstack.org/#/c/103437/ https://review.openstack.org/#/c/103436/ https://review.openstack.org/#/c/66783/ The one of them has already approved, but it stops merging with temporary -2. The other ones have gotten one +2 on each PS. This work will make v2.1 API strong. Thanks Ken'ichi Ohmichi ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Happy to co-sponsor these, they have very minimal risk to the rest of Nova. I just went and reviewed the patches and added my +2 to them, so they are ready to merge should the FFE be approved. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Hi, I do not think that Nova is in a death spiral. I just think that the current way of working at the moment is strangling the project. I do not understand why we need to split drivers out of the core project. Why not have the ability to provide Œcore review¹ status to people for reviewing those parts of the code? We have enough talented people in OpenStack to be able to write a driver above gerrit to enable that. Fragmenting the project will be very unhealthy. For what it is worth having a release date at the end of a vacation is really bad. Look at the numbers: http://stackalytics.com/report/contribution/nova-group/30 Thanks Gary On 9/4/14, 3:59 PM, Thierry Carrez thie...@openstack.org wrote: Like I mentioned before, I think the only way out of the Nova death spiral is to split code and give control over it to smaller dedicated review teams. This is one way to do it. Thanks Dan for pulling this together :) A couple comments inline: Daniel P. Berrange wrote: [...] This is a crisis. A large crisis. In fact, if you got a moment, it's a twelve-storey crisis with a magnificent entrance hall, carpeting throughout, 24-hour portage, and an enormous sign on the roof, saying 'This Is a Large Crisis'. A large crisis requires a large plan. [...] I totally agree. We need a plan now, because we can't go through another cycle without a solution in sight. [...] This has quite a few implications for the way development would operate. - The Nova core team at least, would be voluntarily giving up a big amount of responsibility over the evolution of virt drivers. Due to human nature, people are not good at giving up power, so this may be painful to swallow. Realistically current nova core are not experts in most of the virt drivers to start with, and more important we clearly do not have sufficient time to do a good job of review with everything submitted. Much of the current need for core review of virt drivers is to prevent the mis-use of a poorly defined virt driver API...which can be mitigated - See later point(s) - Nova core would/should not have automatic +2 over the virt driver repositories since it is unreasonable to assume they have the suitable domain knowledge for all virt drivers out there. People would of course be able to be members of multiple core teams. For example John G would naturally be nova-core and nova-xen-core. I would aim for nova-core and nova-libvirt-core, and so on. I do not want any +2 responsibility over VMWare/HyperV/Docker drivers since they're not my area of expertize - I only look at them today because they have no other nova-core representation. - Not sure if it implies the Nova PTL would be solely focused on Nova common. eg would there continue to be one PTL over all virt driver implementation projects, or would each project have its own PTL. Maybe this is irrelevant if a Czars approach is chosen by virt driver projects for their work. I'd be inclined to say that a single PTL should stay as a figurehead to represent all the virt driver projects, acting as a point of contact to ensure we keep communication / co-operation between the drivers in sync. [...] At this point it may look like our current structure (programs, one PTL, single core teams...) prevents us from implementing that solution. I just want to say that in OpenStack, organizational structure reflects how we work, not the other way around. If we need to reorganize official project structure to work in smarter and long-term healthy ways, that's a really small price to pay. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vmware][nova][FFE] vmware-spawn-refactor
On Thu, Sep 04, 2014 at 02:09:26PM +0100, Matthew Booth wrote: I'd like to request a FFE for the remaining changes from vmware-spawn-refactor. They are: https://review.openstack.org/#/c/109754/ https://review.openstack.org/#/c/109755/ https://review.openstack.org/#/c/114817/ https://review.openstack.org/#/c/117467/ https://review.openstack.org/#/c/117283/ https://review.openstack.org/#/c/98322/ All but the last had +A, and were in the gate at the time it was closed. The last had not yet been approved, but is ready for core review. It has recently had some orthogonal changes split out to simplify it considerably. It is largely a code motion patch, and has been given +1 by VMware CI multiple times. They're all internal to the VMWare driver, have multiple ACKs from VMWare maintainers as well as core, so don't require extra review time. So I think it is reasonable request. ACK, I'll sponsor it. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][FFE] Feature freeze exception for virt-driver-numa-placement
On 9/4/14, 4:30 PM, Sean Dague s...@dague.net wrote: On 09/04/2014 09:21 AM, Daniel P. Berrange wrote: On Thu, Sep 04, 2014 at 03:07:24PM +0200, Nikola Đipanov wrote: On 09/04/2014 02:31 PM, Sean Dague wrote: On 09/04/2014 07:58 AM, Nikola Đipanov wrote: Hi team, I am requesting the exception for the feature from the subject (find specs at [1] and outstanding changes at [2]). Some reasons why we may want to grant it: First of all all patches have been approved in time and just lost the gate race. Rejecting it makes little sense really, as it has been commented on by a good chunk of the core team, most of the invasive stuff (db migrations for example) has already merged, and the few parts that may seem contentious have either been discussed and agreed upon [3], or can easily be addressed in subsequent bug fixes. It would be very beneficial to merge it so that we actually get real testing on the feature ASAP (scheduling features are not tested in the gate so we need to rely on downstream/3rd party/user testing for those). This statement bugs me. It seems kind of backwards to say we should merge a thing that we don't have a good upstream test plan on and put it in a release so that the testing will happen only in the downstream case. The objective reality is that many other things have not had upstream testing for a long time (anything that requires more than 1 compute node in Nova for example, and any scheduling feature - as I mention clearly above), so not sure how that is backwards from any reasonable point. More critically with NUMA feature, AFAIK, there is no public cloud in existance which exposes NUMA to the guest. So unless someone is willing to pay for 100's of bare metal servers to run tempest on, I don't know of any infrastructure on which we can test NUMA today. Of course once we include NUMA features in Nova and release Nova, then the Rackspace and/or HP clouds will be in a position to start considering how when they might expose NUMA features for instances they host. So by including it in Nova today, we would be helping move towards a future where we will be able to run tempest against NUMA features. Blocking NUMA from Nova for lack of automated testing will leave us trapped in a chicken and egg scenario, potentially forever. That's not in anyones best interests IMHO The spec specifically calls out the scheduler piece being the part that probably most needs to be tested, especially at large scales here. Those pieces don't need Tempest to test them, they need more solid functional tests around the scheduler under those circumstances. There are interesting (and not all that difficult) ways to do this given the resources we have, which don't seem to be being explored, which is my concern. I share your concern with this feature. I stated it on review https://review.openstack.org/#/c/115007/ in PS 16. I think that we have well known scheduling issues and these will be accentuated by a feature like this. My feeling is that this feature and the PCI feature are both going to be problematic under scale. My reservations are when the feature is not enabled that a lot of unnecessary data will be passed between hosts and the scheduler (this is why we should have gone with the extensible resources (but that is opening a can of worms)). Having said that I think that Nova needs features like this. I am in favor of moving ahead with this for a number of reasons: 1. The filter is not enabled by default 2. We can fix things moving forwards So I am +1 on this. If we can document that it is experimental or use at your own risk then I am +2. But I think that the fact that the admin needs to configure the filter she/he knows it is at their own risk. A luta continua -Sean -- Sean Dague https://urldefense.proofpoint.com/v1/url?u=http://dague.net/k=oIvRg1%2BdG AgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3D% 0Am=Vr9ci4W1jJwlMVh7NJWsxGeY52I2AJ113JDTFO2CluA%3D%0As=45070dc04c1c3bb93 93b6273d23a8310ea404b716cf40c299b487e24ba5a8552 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?
On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote: [...] How would people feel about turning [auto-abandon] back on? A lot of reviewers (myself among them) feel auto-abandon was a cold and emotionless way to provide feedback on a change. Especially on high-change-volume projects where core reviewers may at times get sucked into triaging other problems for long enough that the auto-abandoner kills lots of legitimate changes (possibly from new contributors who will get even more disgusted by this than the silence itself and walk away indefinitely with the impression that we really aren't a welcoming development community at all). Can it be done on a per project basis? It can, by running your own... but again it seems far better for core reviewers to decide if a change has potential or needs to be abandoned--that way there's an accountable human making that deliberate choice rather than the review team hiding behind an automated process so that no one is to blame for hurt feelings besides the infra operators who are enforcing this draconian measure for you. To make the whole process a little friendlier we could increase the time frame from 1 week to 2. snarkHow about just automatically abandon any new change as soon as it's published, and if the contributor really feels it's important they'll unabandon it./snark -- Jeremy Stanley ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vmware][nova][FFE] vmware-spawn-refactor
On 04/09/14 14:46, Daniel P. Berrange wrote: On Thu, Sep 04, 2014 at 02:09:26PM +0100, Matthew Booth wrote: I'd like to request a FFE for the remaining changes from vmware-spawn-refactor. They are: https://review.openstack.org/#/c/109754/ https://review.openstack.org/#/c/109755/ https://review.openstack.org/#/c/114817/ https://review.openstack.org/#/c/117467/ https://review.openstack.org/#/c/117283/ https://review.openstack.org/#/c/98322/ All but the last had +A, and were in the gate at the time it was closed. The last had not yet been approved, but is ready for core review. It has recently had some orthogonal changes split out to simplify it considerably. It is largely a code motion patch, and has been given +1 by VMware CI multiple times. They're all internal to the VMWare driver, have multiple ACKs from VMWare maintainers as well as core, so don't require extra review time. So I think it is reasonable request. ACK, I'll sponsor it. Thanks, Dan. John Garbutt has also said he'll sponsor the previously approved patches, so that's 2. Matt -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) GPG ID: D33C3490 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Design Summit reloaded
Hi everyone, I've been thinking about what changes we can bring to the Design Summit format to make it more productive. I've heard the feedback from the mid-cycle meetups and would like to apply some of those ideas for Paris, within the constraints we have (already booked space and time). Here is something we could do: Day 1. Cross-project sessions / incubated projects / other projects I think that worked well last time. 3 parallel rooms where we can address top cross-project questions, discuss the results of the various experiments we conducted during juno. Don't hesitate to schedule 2 slots for discussions, so that we have time to come to the bottom of those issues. Incubated projects (and maybe other projects, if space allows) occupy the remaining space on day 1, and could occupy pods on the other days. Day 2 and Day 3. Scheduled sessions for various programs That's our traditional scheduled space. We'll have a 33% less slots available. So, rather than trying to cover all the scope, the idea would be to focus those sessions on specific issues which really require face-to-face discussion (which can't be solved on the ML or using spec discussion) *or* require a lot of user feedback. That way, appearing in the general schedule is very helpful. This will require us to be a lot stricter on what we accept there and what we don't -- we won't have space for courtesy sessions anymore, and traditional/unnecessary sessions (like my traditional release schedule one) should just move to the mailing-list. Day 4. Contributors meetups On the last day, we could try to split the space so that we can conduct parallel midcycle-meetup-like contributors gatherings, with no time boundaries and an open agenda. Large projects could get a full day, smaller projects would get half a day (but could continue the discussion in a local bar). Ideally that meetup would end with some alignment on release goals, but the idea is to make the best of that time together to solve the issues you have. Friday would finish with the design summit feedback session, for those who are still around. I think this proposal makes the best use of our setup: discuss clear cross-project issues, address key specific topics which need face-to-face time and broader attendance, then try to replicate the success of midcycle meetup-like open unscheduled time to discuss whatever is hot at this point. There are still details to work out (is it possible split the space, should we use the usual design summit CFP website to organize the scheduled time...), but I would first like to have your feedback on this format. Also if you have alternative proposals that would make a better use of our 4 days, let me know. Apologies for jumping on this thread late. I'm all for the idea of accommodating a more fluid form of project- specific discussion, with the schedule emerging in a dynamic way. But one aspect of the proposed summit redesign that isn't fully clear to me is the cross-over between the new Contributors meetups and the Project pods that we tried out for the first time in Atlanta. That seemed, to me at least, to be a very useful experiment. In fact: parallel midcycle-meetup-like contributors gatherings, with no time boundaries and an open agenda sounds like quite a good description of how some projects used their pods in ATL. The advantage of the pods approach in my mind, included: * no requirement for reducing the number of design sessions slots, as the pod time ran in parallel with the design session tracks of other projects * depending on where in the week the project track occurred, the pod time could include a chunk of scene-setting/preparation discussion *in advance of* the more structured design sessions * on a related theme, the pods did not rely on the graveyard shift at the backend of the summit when folks tend to hit their Friday afternoon brain-full state Am I missing some compelling advantage of moving all these emergent project-specific meetups to the Friday? Cheers, Eoghan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vmware][nova][FFE] vmware-spawn-refactor
On 09/04/2014 03:46 PM, Daniel P. Berrange wrote: On Thu, Sep 04, 2014 at 02:09:26PM +0100, Matthew Booth wrote: I'd like to request a FFE for the remaining changes from vmware-spawn-refactor. They are: https://review.openstack.org/#/c/109754/ https://review.openstack.org/#/c/109755/ https://review.openstack.org/#/c/114817/ https://review.openstack.org/#/c/117467/ https://review.openstack.org/#/c/117283/ https://review.openstack.org/#/c/98322/ All but the last had +A, and were in the gate at the time it was closed. The last had not yet been approved, but is ready for core review. It has recently had some orthogonal changes split out to simplify it considerably. It is largely a code motion patch, and has been given +1 by VMware CI multiple times. They're all internal to the VMWare driver, have multiple ACKs from VMWare maintainers as well as core, so don't require extra review time. So I think it is reasonable request. ACK, I'll sponsor it. +1 here - I've already looked at a number of those. N. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Nova] [FFE] alternative request for v2-on-v3-api
Hi I'd like to request FFE for v2.1 API patches. This request is different from Christopher's one. His request is for the approved patches, but this is for some patches which are not approved yet. https://review.openstack.org/#/c/113169/ : flavor-manage API https://review.openstack.org/#/c/114979/ : quota-sets API https://review.openstack.org/#/c/115197/ : security_groups API I think these API are used in many cases and important, so I'd like to test v2.1 API with them together on RC phase. Two of them have gotten one +2 on each PS and the other one have gotten one +1. Thanks Ken'ichi Ohmichi ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] [glance] do NOT ever sort requirements.txt
On 09/03/2014 09:09 PM, Clark Boylan wrote: On Wed, Sep 3, 2014, at 11:51 AM, Kuvaja, Erno wrote: -Original Message- From: Sean Dague [mailto:s...@dague.net] Sent: 03 September 2014 13:37 To: OpenStack Development Mailing List (not for usage questions) Subject: [openstack-dev] [all] [glance] do NOT ever sort requirements.txt I'm not sure why people keep showing up with sort requirements patches like - https://review.openstack.org/#/c/76817/6, however, they do. All of these need to be -2ed with predjudice. requirements.txt is not a declarative interface. The order is important as pip processes it in the order it is. Changing the order has impacts on the overall integration which can cause wedges later. So please stop. -Sean -- Sean Dague http://dague.net Hi Sean all, Could you please open this up a little bit? What are we afraid breaking regarding the order of these requirements? I tried to go through pip documentation but I could not find reason of specific order of the lines, references to keep the order there was 'though. I'm now assuming one thing here as I do not know if that's the case. None of the packages enables/disables functionality depending of what has been installed on the system before, but they have their own dependencies to provide those. Based on this assumption I can think of only one scenario causing us issues. That is us abusing the example in point 2 of https://pip.pypa.io/en/latest/user_guide.html#requirements-files meaning; we install package X depending on package Y=1.0,2.0 before installing package Z depending on Y=1.0 to ensure that package Y2.0 without pinning package Y in our requirements.txt. I certainly hope that this is not the case as depending 3rd party vendor providing us specific version of dependency package would be extremely stupid. Other than that I really don't know how the order could cause us issues, but I would be really happy to learn something new today if that is the case or if my assumption went wrong. Best Regards, Erno (jokke_) Kuvaja ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev The issue is described in the bug that Josh linked (https://github.com/pypa/pip/issues/988). Basically pip doesn't do dependency resolution in a way that lets you treat requirements as order independent. For that to be the case pip would have to evaluate all dependencies together then install the intersection of those dependencies. Instead it iterates over the list(s) in order and evaluates each dependency as it is found. Your example basically describes where this breaks. You can both depend on the same dependency at different versions and pip will install a version that satisfies only one of the dependencies and not the other leading to a failed install. However I think a more common case is that openstack will pin a dependency and say Y=1.0,2.0 and the X dependency will say Y=1.0. If the X dependency comes first you get version 2.5 which is not valid for your specification of Y=1.0,2.0 and pip fails. You fix this by listing Y before X dependency that installs Y with less restrictive boundaries. Another example of a slightly different failure would be hacking, flake8, pep8, and pyflakes. Hacking installs a specific version of flake8, pep8, and pyflakes so that we do static lint checking with consistent checks each release. If you sort this list alphabetically instead of allowing hacking to install its deps flake8 will come first and you can get a different version of pep8. Different versions of pep8 check different things and now the gate has broken. The most problematic thing is you can't count on your dependencies from not breaking you if they come first (because they are evaluated first). So in cases where we know order is important (hacking and pbr and probably a handful of others) we should be listing them as early as possible in the requirements. So, is there a specific order to look out for? AFAIU requirements should have pbr as first requirement and test-requirements should have hacking as first one. Is there anything else? What's the best place to document this? Andreas -- Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn,Jennifer Guild,Felix Imendörffer,HRB16746 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). My only question is about the need to separate out each virt driver into a separate project, wouldn't you accomplish a lot of the benefit by creating a single virt project that includes all of the drivers? I wouldn't necessarily expect a VMware guy to understand the specifics of the HyperV implementation but both people should understand what a virt driver does, how it interfaces to Nova and they should be able to intelligently review each other's code. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Daniel P. Berrange [mailto:berra...@redhat.com] Sent: Thursday, September 4, 2014 4:24 AM To: OpenStack Development Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the reviews in a more efficient manner than plain gerrit allows for. These certainly help, but they can't ever solve the problem on their own - just make it slightly more bearable. And this is not even considering that core team members might have useful contributions to make in ways beyond just code review. Ultimately the workload is just too high to sustain the levels of review required, so core team members will eventually burn out (as they have done many times already). Even if one person attempts to take the initiative to heavily invest in review of certain features it is often to no avail. Unless a second dedicated core reviewer can be found to 'tag team' it is hard for one person to make a difference. The end result is that a patch is +2d and then sits idle for weeks or more until a merge conflict requires it to be reposted at which point even that one +2 is lost. This is a pretty demotivating outcome for both reviewers the patch contributor. New core team talent It can't escape attention that the Nova core team does not grow in size very often. When Nova was younger and its code base was smaller, it was easier for contributors to get onto core because the base level of knowledge required was that much smaller. To get onto core today requires a major investment in learning Nova over a year or more. Even people who potentially have the latent skills may not have the time available to invest in learning the entire of Nova. With the number of reviews proposed to Nova, the core team should probably be at least double its current size[1]. There is plenty of expertize in the project as a whole but it is typically focused into specific areas of the codebase. There is nowhere we can find 20 more people with broad knowledge of the codebase who could be promoted even over the next year, let alone today. This is ignoring that many existing members of core are relatively inactive due to burnout and so need replacing. That means we really need another 25-30 people for core. That's not going to happen. Code review delays
Re: [openstack-dev] [Nova] [feature freeze exception] Move to oslo.db
On Wed, Sep 3, 2014 at 11:30 PM, Michael Still mi...@stillhq.com wrote: I'm good with this one too, so that makes three if Joe is ok with this. I am ok with this, I hope the move to oslo.db will fix a few bugs for us and the nova patch to review isn't too bad. @Josh -- can you please take a look at the TH failures? Thanks, Michael On Wed, Sep 3, 2014 at 8:10 PM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 9/3/2014 5:08 PM, Andrey Kurilin wrote: Hi All! I'd like to ask for a feature freeze exception for porting nova to use oslo.db. This change not only removes 3k LOC, but fixes 4 bugs(see commit message for more details) and provides relevant, stable common db code. Main maintainers of oslo.db(Roman Podoliaka and Victor Sergeyev) are OK with this. Joe Gordon and Matt Riedemann are already signing up, so we need one more vote from Core developer. By the way a lot of core projects are using already oslo.db for a while: keystone, cinder, glance, ceilometer, ironic, heat, neutron and sahara. So migration to oslo.db won’t produce any unexpected issues. Patch is here: https://review.openstack.org/#/c/101901/ -- Best regards, Andrey Kurilin. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Just re-iterating my agreement to sponsor this. I'm waiting for the latest patch set to pass Jenkins and for Roman to review after his comments from the previous patch set and -1. Otherwise I think this is nearly ready to go. The turbo-hipster failures on the change appear to be infra issues in t-h rather than problems with the code. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] WARNING: upcoming dependency change to oslotest
Next week the Oslo team will be releasing a new version of oslotest that replaces its use of the “mox” library with “mox3. This will allow us to prepare a packaged version of oslotest that works on both python 2 and 3, which is necessary for porting some of the other Oslo libraries as well as applications which are trying to use Oslo and support python 3. mox3 has the same API as mox, so if your test suite uses oslotest.moxstubout you shouldn’t notice any difference. If you are using oslotest but also import mox directly in some of your test modules and do not have an explicit dependency on mox, your tests will break. There are two ways to fix them: change them to use the moxstubout module to get a mox instance or add mox to your test-requirements.txt list. The first solution, using moxstubout from oslotest, is preferred because it means your test suite is one step closer to being python 3 ready. However, updating test-requirements.txt may be a less invasive change and so it might be more expedient to use that approach for now. Doug ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [qa][all][Heat] Packaging of functional tests
On 08/29/2014 05:15 PM, Zane Bitter wrote: On 29/08/14 14:27, Jay Pipes wrote: On 08/26/2014 10:14 AM, Zane Bitter wrote: Steve Baker has started the process of moving Heat tests out of the Tempest repository and into the Heat repository, and we're looking for some guidance on how they should be packaged in a consistent way. Apparently there are a few projects already packaging functional tests in the package projectname.tests.functional (alongside projectname.tests.unit for the unit tests). That strikes me as odd in our context, because while the unit tests run against the code in the package in which they are embedded, the functional tests run against some entirely different code - whatever OpenStack cloud you give it the auth URL and credentials for. So these tests run from the outside, just like their ancestors in Tempest do. There's all kinds of potential confusion here for users and packagers. None of it is fatal and all of it can be worked around, but if we refrain from doing the thing that makes zero conceptual sense then there will be no problem to work around :) I suspect from reading the previous thread about In-tree functional test vision that we may actually be dealing with three categories of test here rather than two: * Unit tests that run against the package they are embedded in * Functional tests that run against the package they are embedded in * Integration tests that run against a specified cloud i.e. the tests we are now trying to add to Heat might be qualitatively different from the projectname.tests.functional suites that already exist in a few projects. Perhaps someone from Neutron and/or Swift can confirm? I'd like to propose that tests of the third type get their own top-level package with a name of the form projectname-integrationtests (second choice: projectname-tempest on the principle that they're essentially plugins for Tempest). How would people feel about standardising that across OpenStack? By its nature, Heat is one of the only projects that would have integration tests of this nature. For Nova, there are some functional tests in nova/tests/integrated/ (yeah, badly named, I know) that are tests of the REST API endpoints and running service daemons (the things that are RPC endpoints), with a bunch of stuff faked out (like RPC comms, image services, authentication and the hypervisor layer itself). So, the integrated tests in Nova are really not testing integration with other projects, but rather integration of the subsystems and processes inside Nova. I'd support a policy that true integration tests -- tests that test the interaction between multiple real OpenStack service endpoints -- be left entirely to Tempest. Functional tests that test interaction between internal daemons and processes to a project should go into /$project/tests/functional/. For Heat, I believe tests that rely on faked-out other OpenStack services but stress the interaction between internal Heat daemons/processes should be in /heat/tests/functional/ and any tests the rely on working, real OpenStack service endpoints should be in Tempest. Well, the problem with that is that last time I checked there was exactly one Heat scenario test in Tempest because tempest-core doesn't have the bandwidth to merge all (any?) of the other ones folks submitted. So we're moving them to openstack/heat for the pure practical reason that it's the only way to get test coverage at all, rather than concerns about overloading the gate or theories about the best venue for cross-project integration testing. Hmm, speaking of passive aggressivity... Where can I see a discussion of the Heat integration tests with Tempest QA folks? If you give me some background on what efforts have been made already and what is remaining to be reviewed/merged/worked on, then I can try to get some resources dedicated to helping here. I would greatly prefer just having a single source of integration testing in OpenStack, versus going back to the bad ol' days of everybody under the sun rewriting their own. Note that I'm not talking about functional testing here, just the integration testing... Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][FFE] Feature freeze exception for virt-driver-numa-placement
On 4 September 2014 14:07, Nikola Đipanov ndipa...@redhat.com wrote: On 09/04/2014 02:31 PM, Sean Dague wrote: On 09/04/2014 07:58 AM, Nikola Đipanov wrote: Hi team, I am requesting the exception for the feature from the subject (find specs at [1] and outstanding changes at [2]). Some reasons why we may want to grant it: First of all all patches have been approved in time and just lost the gate race. Rejecting it makes little sense really, as it has been commented on by a good chunk of the core team, most of the invasive stuff (db migrations for example) has already merged, and the few parts that may seem contentious have either been discussed and agreed upon [3], or can easily be addressed in subsequent bug fixes. It would be very beneficial to merge it so that we actually get real testing on the feature ASAP (scheduling features are not tested in the gate so we need to rely on downstream/3rd party/user testing for those). This statement bugs me. It seems kind of backwards to say we should merge a thing that we don't have a good upstream test plan on and put it in a release so that the testing will happen only in the downstream case. The objective reality is that many other things have not had upstream testing for a long time (anything that requires more than 1 compute node in Nova for example, and any scheduling feature - as I mention clearly above), so not sure how that is backwards from any reasonable point. Thanks to folks using them, it is still kept working and bugs get fixed. Getting features into the hands of users is extremely important... Anyway, not enough to -1 it, but enough to at least say something. .. but I do not want to get into the discussion about software testing here, not the place really. However, I do think it is very harmful to respond to FFE request with such blanket statements and generalizations, if only for the message it sends to the contributors (that we really care more about upholding our own myths as a community than users and features). I believe you brought this up as one of your justifications for the FFE. When I read your statement it does sound as though you want to put experimental code in at the final release. I am sure that is not what you had in mind, but I am also sure you can also understand Sean's point of view. His point is clear and pertinent to your request. As the person responsible for Nova in HP I will be interested to see how it operates in practice. I can assure you we will do extensive testing on it before it goes into the wild and we will not put it into practice if we are not happy. Paul Paul Murray Nova Technical Lead, HP Cloud +44 117 312 9309 Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as HP CONFIDENTIAL. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] requesting an FFE for SRIOV
Hi, The main sr-iov patches have gone through lots of code reviews, manual rebasing, etc. Now we have some critical refactoring work on the existing infra to get it ready. All the code for refactoring and sr-iov is up for review. https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov thanks, Robert ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [sahara] integration tests in python-saharaclient
Yes, I wrote them. I use them all the time -- no typo that I know of. They are great for spinning up a cluster and running EDP jobs. They may need some polish, but the point is to test the whole chain of operations from the CLI. This is contrary to what most OpenStack projects traditionally do -- most CLI testing is only transformation testing, that is it tests the output of CLI commands in Tempest but does not test any kind of integration from the CLI. Different communities however will have different requirements. At Red Hat, for instance, many of our customers rely heavily on the command line, and our testing includes integration tests from the CLI as the entry point. We want this kind of testing. In fact, in the Icehouse release I found a bug by running the CLI integration tests. There was a mismatch between the CLI and Sahara. These tests are not run in CI currently, however, when/if we end up with more horsepower in CI they should be. They should not be deleted. Best, Trevor On Wed, 2014-09-03 at 14:58 -0700, Andrew Lazarev wrote: Hi team, Today I've realized that we have some tests called 'integration' in python-saharaclient. Also I've found out that Jenkins doesn't use them and they can't be run starting from April because of typo in tox.ini. Does anyone know what these tests are? Does anyone mind if I delete them since we don't use them anyway? Thanks, Andrew. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [sahara] integration tests in python-saharaclient
by the way, what typo? Trev On Wed, 2014-09-03 at 14:58 -0700, Andrew Lazarev wrote: Hi team, Today I've realized that we have some tests called 'integration' in python-saharaclient. Also I've found out that Jenkins doesn't use them and they can't be run starting from April because of typo in tox.ini. Does anyone know what these tests are? Does anyone mind if I delete them since we don't use them anyway? Thanks, Andrew. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
Sean Dague wrote: [...] So, honestly, I'll probably remain -1 on the final integration vote, not because Zaqar is bad, but because I'm feeling more firmly that for OpenStack to not leave the small deployers behind we need to redefine the tightly integrated piece of OpenStack to basically the Layer 1 2 parts of my diagram, and consider the rest of the layers exciting parts of our ecosystem that more advanced users may choose to deploy to meet their needs. Smaller tent, big ecosystem, easier on ramp. I realize that largely means Zaqar would be caught up in a definition discussion outside of it's control, and that's kind of unfortunate, as Flavio and team have been doing a bang up job of late. But we need to stop considering integration as the end game of all interesting software in the OpenStack ecosystem, and I think it's better to have that conversation sooner rather than later. I think it's pretty clear at this point that: (1) we need to have a discussion about layers (base nucleus, optional extra services at the very least) and the level of support we grant to each -- the current binary approach is not working very well (2) If we accept Zaqar next week, it's pretty clear it would not fall in the base nucleus layer but more in an optional extra services layer, together with at the very least Trove and Sahara There are two ways of doing this: follow Sean's approach and -1 integration (and have zaqar apply to that optional layer when we create it), or +1 integration now (and have zaqar follow whichever other integrated projects we place in that layer when we create it). I'm still hesitating on the best approach. I think they yield the same end result, but the -1 approach seems to be a bit more unfair, since it would be purely for reasons we don't (yet) apply to currently-integrated projects... -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation
On 09/02/2014 07:15 AM, Duncan Thomas wrote: On 11 August 2014 19:26, Jay Pipes jaypi...@gmail.com wrote: The above does not really make sense for MySQL Galera/PXC clusters *if only Galera nodes are used in the cluster*. Since Galera is synchronously replicated, there's no real point in segregating writers from readers, IMO. Better to just spread the write AND read load equally among all Galera cluster nodes. Unfortunately it is possible to get bitten by the difference between 'synchronous' and 'virtually synchronous' in practice. Not in my experience. The thing that has bitten me in practice are Galera's lack of support for SELECT FOR UPDATE, which is used extensively in some of the OpenStack projects. Instead of taking a write-intent lock on one or more record gaps (which is what InnoDB does in the case of a SELECT FOR UPDATE on a local node), Galera happily replicates DML statements to all other nodes in the cluster. If two of those nodes attempt to modify the same row or rows in a table, then the working set replication will fail to certify, which results in a certification timeout, which is then converted to an InnoDB deadlock error. It's the difference between hanging around waiting on a local node for the transaction that called SELECT FOR UPDATE to complete and release the write-intent locks on a set of table rows versus hanging around waiting for the InnoDB deadlock/lock timeout to bubble up from the working set replication certification (which typically is longer than the time taken to lock the rows in a single transaction, and therefore causes thundering herd issues with the conductor attempting to retry stuff due to the use of the @retry_on_deadlock decorator which is so commonly used everywhere) FWIW, I've cc'd a real expert on the matter. Peter, feel free to clarify, contradict, or just ignore me :) Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
My only question is about the need to separate out each virt driver into a separate project, wouldn't you accomplish a lot of the benefit by creating a single virt project that includes all of the drivers? I don't think there's particularly a *point* to having all drivers in one repo. Part of code review is looking for code gotchas, but part of code review is looking for subtle issues that are caused by the very nature of the driver. A HyperV core reviewing a libvirt change should certainly be able to provide the former, but most likely cannot provide the latter to a sufficient degree (if he or she can, then he or she should be a libvirt core as well). A strong +1 to Dan's proposal. I think this would also make it easier for non-core reviewers to get started reviewing, without having a specialized tool setup. Best Regards, Solly Ross P.S. This is a crisis. A large crisis. In fact, if you got a moment, it's a twelve-storey crisis with a magnificent entrance hall, carpeting throughout, 24-hour portage, and an enormous sign on the roof, saying 'This Is a Large Crisis'. A large crisis requires a large plan. Ha! - Original Message - From: Donald D Dugger donald.d.dug...@intel.com To: Daniel P. Berrange berra...@redhat.com, OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Thursday, September 4, 2014 10:33:27 AM Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). My only question is about the need to separate out each virt driver into a separate project, wouldn't you accomplish a lot of the benefit by creating a single virt project that includes all of the drivers? I wouldn't necessarily expect a VMware guy to understand the specifics of the HyperV implementation but both people should understand what a virt driver does, how it interfaces to Nova and they should be able to intelligently review each other's code. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Daniel P. Berrange [mailto:berra...@redhat.com] Sent: Thursday, September 4, 2014 4:24 AM To: OpenStack Development Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the reviews in a more efficient manner than plain gerrit allows for. These certainly help, but they can't ever solve the problem on their own - just make it slightly more bearable. And this is not even considering that core team members might have useful contributions to make in ways beyond just code review. Ultimately the workload is just too high to sustain the levels of review required, so core team members will eventually burn out (as they have done many times already). Even if one person attempts to take the initiative to
Re: [openstack-dev] Treating notifications as a contract (CADF)
On 9/04/2014 Sandy Walsh wrote: Yesterday, we had a great conversation with Matt Rutkowski from IBM, one of the authors of the CADF spec. I was having a disconnect on what CADF offers and got it clarified. My assumption was CADF was a set of transformation/extraction rules for taking data from existing data structures and defining them as well-known things. For example, CADF needs to know who sent this notification. I thought CADF would give us a means to point at an existing data structure and say that's where you find it. But I was wrong. CADF is a full-on schema/data structure of its own. It would be a fork-lift replacement for our existing notifications. This was my aha as well, following a similar discussion with Matt and team, but also note that they've articulated an approach for bolt-on changes that would enable CADF content in existing pipelines. (https://wiki.openstack.org/wiki/Ceilometer/blueprints/support-standard-audit-formats) However, if your service hasn't really adopted notifications yet (green field) or you can handle a fork-lift replacement, CADF is a good option. There are a few gotcha's though. If you have required data that is outside of the CADF spec, it would need to go in the attachment section of the notification and that still needs a separate schema to define it. Matt's team is very receptive to extending the spec to include these special cases though. Agreed that Matt's team was very willing to extend, but I still wonder about having to migrate appended data from its pre-approval location to its permanent location, depending on the speed of the CADF standard update. Anyway, I've written up all the options (as I see them) [1] with the advantages/disadvantages of each approach. It's just a strawman, so bend/spindle/mutilate. Cool...will add comments there. Look forward to feedback! -S [1] https://wiki.openstack.org/wiki/NotificationsAndCADF On 9/3/2014 12:30 PM, Sandy Walsh wrote: On 9/3/2014 11:32 AM, Chris Dent wrote: On Wed, 3 Sep 2014, Sandy Walsh wrote: We're chatting with IBM about CADF and getting down to specifics on their applicability to notifications. Once I get StackTach.v3 into production I'm keen to get started on revisiting the notification format and olso.messaging support for notifications. Perhaps a hangout for those keenly interested in doing something about this? That seems like a good idea. I'd like to be a part of that. I would , too, and I would suggest that much of the Ceilometer team would Unfortunately I won't be at summit but would like to contribute what I can before and after. I took some notes on this a few weeks ago and extracted what seemed to be the two main threads or ideas the were revealed by the conversation that happened in this thread: * At the micro level have versioned schema for notifications such that one end can declare I am sending version X of notification foo.bar.Y and the other end can effectively deal. Yes, that's table-stakes I think. Putting structure around the payload section. Beyond type and version we should be able to attach meta information like public/private visibility and perhaps hints for external mapping (this trait - that trait in CADF, for example). * At the macro level standardize a packaging or envelope of all notifications so that they can be consumed by very similar code. That is: constrain the notifications in some way so we can also constrain the consumer code. That's the intention of what we have now. The top level traits are standard, the payload is open. We really only require: message_id, timestamp and event_type. For auditing we need to cover Who, What, When, Where, Why, OnWhat, OnWhere, FromWhere. To whit, I think we've made good progress in this by defining the what is the minimum content for PaaS service notifications and gotten agreement around https://review.openstack.org/#/c/113396/11/doc/source/format.rst for the Juno release. It's been driven by many of these same questions but is fairly narrow in scope; it defines a minimum set of content, but doesn't tackle the question of structure (beyond trait typing). The timing seems right to dig deeper. These ideas serve two different purposes: One is to ensure that existing notification use cases are satisfied with robustness and provide a contract between two endpoints. The other is to allow a fecund notification environment that allows and enables many participants. Good goals. When Producer and Consumer know what to expect, things are good ... I know to find the Instance ID here. When the consumer wants to deal with a notification as a generic object, things get tricky (find the instance ID in the payload, What is the image type?, Is this an error notification?) Basically, how do we define the principle artifacts for
[openstack-dev] [sahara] team meeting Sep 4 1800 UTC
Hi folks, We'll be having the Sahara team meeting as usual in #openstack-meeting-alt channel. Agenda: https://wiki.openstack.org/wiki/Meetings/SaharaAgenda#Next_meetings http://www.timeanddate.com/worldclock/fixedtime.html?msg=Sahara+Meetingiso=20140904T18 -- Sincerely yours, Sergey Lukjanov Sahara Technical Lead (OpenStack Data Processing) Principal Software Engineer Mirantis Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?
On 04/09/14 14:54, Jeremy Stanley wrote: On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote: [...] How would people feel about turning [auto-abandon] back on? A lot of reviewers (myself among them) feel auto-abandon was a cold and emotionless way to provide feedback on a change. Especially on high-change-volume projects where core reviewers may at times get sucked into triaging other problems for long enough that the auto-abandoner kills lots of legitimate changes (possibly from new contributors who will get even more disgusted by this than the silence itself and walk away indefinitely with the impression that we really aren't a welcoming development community at all). Ok, I see how this may be unwelcoming to a new contributor, a feeling that could be justified in some cases. Any established contributor should (I know I did when it was enforce) see it as part of the process. perhaps we exempt new users? On the other hand I'm not talking about abandoning a change because there was silence for a fixed period of time, I'm talking about abandoning it because it got negative feedback and it wasn't addressed either through discussion or a new patch. I have no problem if we push the inactivity period out to a month or more, I just think there needs to be a cutoff at some stage. Can it be done on a per project basis? It can, by running your own... but again it seems far better for core reviewers to decide if a change has potential or needs to be abandoned--that way there's an accountable human making that deliberate choice rather than the review team hiding behind an automated process so that no one is to blame for hurt feelings besides the infra operators who are enforcing this draconian measure for you. There are plenty of examples of places where we have automated processes in the community (some of which may hurt feeling) in order to take load off specific individuals or the community in general. In fact automating processes in places where people don't scale or are bottlenecks seems to be a common theme. We automate CI and give people negative feedback We expire bugs in some projects that are Incomplete and are 60 days inactive I really don't see this as the review team hiding behind an automated process. A patch got negative feedback and we're automating the process to prompt the submitter to deal with it. It may be more friendly if it was a 2 step process 1. (after a few days if inactivity) Add a comment saying you got negative feedback with suggestions of how to proceed and information that the review will be autoabandoned if nothing is done in X number of days. 2. Auto abandon patch, with as much information as possible on how to reopen if needed. To make the whole process a little friendlier we could increase the time frame from 1 week to 2. snarkHow about just automatically abandon any new change as soon as it's published, and if the contributor really feels it's important they'll unabandon it./snark ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [sahara] integration tests in python-saharaclient
As for the sahara-ci, I don't think that we'll have enough free resources on it to run one more set of tests. So, waiting for more 3rd party CIs :) On Thu, Sep 4, 2014 at 6:58 PM, Trevor McKay tmc...@redhat.com wrote: by the way, what typo? Trev On Wed, 2014-09-03 at 14:58 -0700, Andrew Lazarev wrote: Hi team, Today I've realized that we have some tests called 'integration' in python-saharaclient. Also I've found out that Jenkins doesn't use them and they can't be run starting from April because of typo in tox.ini. Does anyone know what these tests are? Does anyone mind if I delete them since we don't use them anyway? Thanks, Andrew. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Sincerely yours, Sergey Lukjanov Sahara Technical Lead (OpenStack Data Processing) Principal Software Engineer Mirantis Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
On 09/04/2014 04:59 PM, Thierry Carrez wrote: Sean Dague wrote: [...] So, honestly, I'll probably remain -1 on the final integration vote, not because Zaqar is bad, but because I'm feeling more firmly that for OpenStack to not leave the small deployers behind we need to redefine the tightly integrated piece of OpenStack to basically the Layer 1 2 parts of my diagram, and consider the rest of the layers exciting parts of our ecosystem that more advanced users may choose to deploy to meet their needs. Smaller tent, big ecosystem, easier on ramp. I realize that largely means Zaqar would be caught up in a definition discussion outside of it's control, and that's kind of unfortunate, as Flavio and team have been doing a bang up job of late. But we need to stop considering integration as the end game of all interesting software in the OpenStack ecosystem, and I think it's better to have that conversation sooner rather than later. I think it's pretty clear at this point that: (1) we need to have a discussion about layers (base nucleus, optional extra services at the very least) and the level of support we grant to each -- the current binary approach is not working very well (2) If we accept Zaqar next week, it's pretty clear it would not fall in the base nucleus layer but more in an optional extra services layer, together with at the very least Trove and Sahara There are two ways of doing this: follow Sean's approach and -1 integration (and have zaqar apply to that optional layer when we create it), or +1 integration now (and have zaqar follow whichever other integrated projects we place in that layer when we create it). As I mentioned in my reply to Sean's email, I believe +1 integration is the correct thing to do. I know it's hard to believe that I'm saying this with my OpenStack hat on and not Zaqar's but that's the truth. I truly believe we can't stop OpenStack's growth on this. We'll manage these growth details later on as we've done so far. Growing is as important as managing the growth. Though, in this case we're not growing without any clue of what will happen. We've a well known path that all integrated projects have followed and, in this specific case, Zaqar is following. Re-evaluating projects is something that has happened - and should happen - every once in a while. Once we have a place for this optional services, we will have to re-evaluate all the integrated projects and move those that fit into that category. I'm still hesitating on the best approach. I think they yield the same end result, but the -1 approach seems to be a bit more unfair, since it would be purely for reasons we don't (yet) apply to currently-integrated projects... +1 Cheers, Flavio -- @flaper87 Flavio Percoco ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 04/09/2014 15:36, Gary Kotton a écrit : Hi, I do not think that Nova is in a death spiral. I just think that the current way of working at the moment is strangling the project. I do not understand why we need to split drivers out of the core project. Why not have the ability to provide Œcore review¹ status to people for reviewing those parts of the code? We have enough talented people in OpenStack to be able to write a driver above gerrit to enable that. Fragmenting the project will be very unhealthy. For what it is worth having a release date at the end of a vacation is really bad. Look at the numbers: http://stackalytics.com/report/contribution/nova-group/30 Thanks Gary From my perspective, the raw number of reviews should not be the only metric for saying if someone good for being a core. Indeed, that's quite easy to provide some comments on cosmetic but if you see why the patches are getting a -1 from a core, that's mostly because of a more important design issue or going reverse from another current effort. Also, I can note that Stackanalytics metrics are *really* different from other tools like http://russellbryant.net/openstack-stats/nova-reviewers-30.txt As a non-core people, I can just say that a core people must be at least there during Nova meetings and voice his opinions, provide some help with the gate status, look at bugs, give feedback to newcomers etc. and not just click on -1 or +1 Here, the problem is that the core team is not scalable : I don't want to provide examples of governments but just adding more people is often not the solution. Instead, providing delegations to subteams seems maybe the intermediate solution for helping this as it could help the core team to only approve and leave the subteam's half-cores reviewing the iterations until they consider the patch enough good for being merged. Of course, nova cores could still bypass half-cores as they know the whole knowledge of Nova, or they could disapprove what the halfcores agreed, but that would free a lot of time for cores without giving them more bureaucracy. I really like Dan's proposal of splitting code into different repos with separate teams and a single PTL (that's exactly the difference in between a Program and a Project) but as it requires some prework, I'm just thinking of allocating halfcores as a short-term solution until all the bits are sorted out. And yes, there is urgency, I also felt the pain. -Sylvain On 9/4/14, 3:59 PM, Thierry Carrez thie...@openstack.org wrote: Like I mentioned before, I think the only way out of the Nova death spiral is to split code and give control over it to smaller dedicated review teams. This is one way to do it. Thanks Dan for pulling this together :) A couple comments inline: Daniel P. Berrange wrote: [...] This is a crisis. A large crisis. In fact, if you got a moment, it's a twelve-storey crisis with a magnificent entrance hall, carpeting throughout, 24-hour portage, and an enormous sign on the roof, saying 'This Is a Large Crisis'. A large crisis requires a large plan. [...] I totally agree. We need a plan now, because we can't go through another cycle without a solution in sight. [...] This has quite a few implications for the way development would operate. - The Nova core team at least, would be voluntarily giving up a big amount of responsibility over the evolution of virt drivers. Due to human nature, people are not good at giving up power, so this may be painful to swallow. Realistically current nova core are not experts in most of the virt drivers to start with, and more important we clearly do not have sufficient time to do a good job of review with everything submitted. Much of the current need for core review of virt drivers is to prevent the mis-use of a poorly defined virt driver API...which can be mitigated - See later point(s) - Nova core would/should not have automatic +2 over the virt driver repositories since it is unreasonable to assume they have the suitable domain knowledge for all virt drivers out there. People would of course be able to be members of multiple core teams. For example John G would naturally be nova-core and nova-xen-core. I would aim for nova-core and nova-libvirt-core, and so on. I do not want any +2 responsibility over VMWare/HyperV/Docker drivers since they're not my area of expertize - I only look at them today because they have no other nova-core representation. - Not sure if it implies the Nova PTL would be solely focused on Nova common. eg would there continue to be one PTL over all virt driver implementation projects, or would each project have its own PTL. Maybe this is irrelevant if a Czars approach is chosen by virt driver projects for their work. I'd be inclined to say that a single PTL should stay as a figurehead to represent all the virt driver projects, acting as a point of
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 9/4/2014 9:57 AM, Daniel P. Berrange wrote: On Thu, Sep 04, 2014 at 02:33:27PM +, Dugger, Donald D wrote: Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). Thanks for taking the time to read give feedback My only question is about the need to separate out each virt driver into a separate project, wouldn't you accomplish a lot of the benefit by creating a single virt project that includes all of the drivers? I wouldn't necessarily expect a VMware guy to understand the specifics of the HyperV implementation but both people should understand what a virt driver does, how it interfaces to Nova and they should be able to intelligently review each other's code. A single repo for virt drivers would have all the same costs of separating from nova common, but with fewer of the benefits of separate repos per driver. IOW, if we're going to split the virt drivers out from the nova common, then we should go all the way. I think the separate driver repos is fairly compelling for a number of reasons besides just core team size. As mentioned elsewhere it allows better targeting of CI test jobs. ie a VMware CI job can be easily made gating for only VMware code changes. So VMWare CI instability won't affect libvirt code submissions, and libvirt CI instability won't affect VMware code submissions. Separate repos means that people starting off a new driver (like Ironic or Docker) would not have to immediately meet the same very high quality testing bar that existing drivers do. THey can evolve at their own pace and not have to then undergo the disruption of jumping from their initial repo to the 'official' repo. Finally, I would like each drivers team to be isolated from each other in terms of code review capacity planning as far as practical - ie the libvirt team should be able to accept as many libvirt features as they can handle without being concerned that they'll reduce what vmware is able to accept (though changes involving the nova common code would obviously still contend). Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the reviews in a more efficient manner than plain gerrit allows for. These certainly help, but they can't ever solve the problem on their own - just make it slightly more bearable. And this is not even considering that core team members might have useful contributions to make in ways beyond just code review. Ultimately the workload is just too high to sustain the levels of review required, so core team members will eventually burn out (as they have done many times already). Even if one person attempts to take the initiative to heavily invest in review of certain features it is often to no avail. Unless a second dedicated core reviewer can be found to 'tag team' it is hard for one person to make a difference. The end result is that a patch is +2d and then sits idle for weeks or more until a merge conflict requires it to be reposted at which point even that one +2 is lost. This is a pretty
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 04/09/2014 17:00, Solly Ross a écrit : My only question is about the need to separate out each virt driver into a separate project, wouldn't you accomplish a lot of the benefit by creating a single virt project that includes all of the drivers? I don't think there's particularly a *point* to having all drivers in one repo. Part of code review is looking for code gotchas, but part of code review is looking for subtle issues that are caused by the very nature of the driver. A HyperV core reviewing a libvirt change should certainly be able to provide the former, but most likely cannot provide the latter to a sufficient degree (if he or she can, then he or she should be a libvirt core as well). A strong +1 to Dan's proposal. I think this would also make it easier for non-core reviewers to get started reviewing, without having a specialized tool setup. As I said previously, I'm also giving a +1 to this proposal. That said, as I think it deserves at least one iteration for getting this done (look at the scheduler split and since hox long we're working on it), I also think we need a short-term solution like the one proposed by Thierry, ie. what I call half-cores - people who help reviewing an code area and free up time for cores just for approving instead of focusing on each iteration. -Sylvain Best Regards, Solly Ross P.S. This is a crisis. A large crisis. In fact, if you got a moment, it's a twelve-storey crisis with a magnificent entrance hall, carpeting throughout, 24-hour portage, and an enormous sign on the roof, saying 'This Is a Large Crisis'. A large crisis requires a large plan. Ha! - Original Message - From: Donald D Dugger donald.d.dug...@intel.com To: Daniel P. Berrange berra...@redhat.com, OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Thursday, September 4, 2014 10:33:27 AM Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). My only question is about the need to separate out each virt driver into a separate project, wouldn't you accomplish a lot of the benefit by creating a single virt project that includes all of the drivers? I wouldn't necessarily expect a VMware guy to understand the specifics of the HyperV implementation but both people should understand what a virt driver does, how it interfaces to Nova and they should be able to intelligently review each other's code. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Daniel P. Berrange [mailto:berra...@redhat.com] Sent: Thursday, September 4, 2014 4:24 AM To: OpenStack Development Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the reviews in a more efficient manner than plain gerrit allows for. These certainly help, but they can't ever
Re: [openstack-dev] [nova] requesting an FFE for SRIOV
The main sr-iov patches have gone through lots of code reviews, manual rebasing, etc. Now we have some critical refactoring work on the existing infra to get it ready. All the code for refactoring and sr-iov is up for review. I've been doing a lot of work on this recently, and plan to see it through if possible. So, I'll be a sponsor. In the meeting russellb said he would as well. I think he's tied up today, so I'm proxying him in here :) --Dan signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [FFE] requesting FFE for LVM ephemeral storage encryption
I would like to request a feature freeze exception for LVM ephemeral storage encryption[1]. The spec[2] for which was approved early in the Juno release cycle. This feature provides security for data at-rest on compute nodes. The proposed feature protects user data from disclosure due to disk block reuse and improper storage media disposal among other threats and also eliminates the need to sanitize LVM volumes. The feature is crucial to data security in OpenStack as explained in the OpenStack Security Guide[3] and benefits cloud users and operators regardless of their industry and scale. The feature was first submitted for review on August 6, 2013 and two of the three patches implementing this feature were merged in Icehouse[4,5]. The remaining patch has had approval from a core reviewer for most of the Icehouse and Juno development cycles. The code is well vetted and ready to be merged. The main concern about accepting this feature pertains to key management. In particular, it uses Barbican to avoid storing keys on the compute host, and Barbican at present has no gate testing. However, the risk of regression in case of failure to integrate Barbican is minimal because the feature interacts with the key manager through an*existing* abstract keymgr interface, i.e., has no*explicit* dependence on Barbican. Moreover, the feature provides some measure of security even with the existing place-holder key manager, for example, against disk block reuse attack. For all of the above reasons I request a feature freeze exception for LVM ephemeral storage encryption. Best regards, Dan 1.https://review.openstack.org/#/c/40467/ 2.https://blueprints.launchpad.net/nova/+spec/lvm-ephemeral-storage-encryption 3. http://docs.openstack.org/security-guide/content/ 4. https://review.openstack.org/#/c/60621/ 5. https://review.openstack.org/#/c/61544/ smime.p7s Description: S/MIME Cryptographic Signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][FFE] Feature freeze exception for virt-driver-numa-placement
On 09/04/2014 04:51 PM, Murray, Paul (HP Cloud) wrote: On 4 September 2014 14:07, Nikola Đipanov ndipa...@redhat.com wrote: On 09/04/2014 02:31 PM, Sean Dague wrote: On 09/04/2014 07:58 AM, Nikola Đipanov wrote: Hi team, I am requesting the exception for the feature from the subject (find specs at [1] and outstanding changes at [2]). Some reasons why we may want to grant it: First of all all patches have been approved in time and just lost the gate race. Rejecting it makes little sense really, as it has been commented on by a good chunk of the core team, most of the invasive stuff (db migrations for example) has already merged, and the few parts that may seem contentious have either been discussed and agreed upon [3], or can easily be addressed in subsequent bug fixes. It would be very beneficial to merge it so that we actually get real testing on the feature ASAP (scheduling features are not tested in the gate so we need to rely on downstream/3rd party/user testing for those). This statement bugs me. It seems kind of backwards to say we should merge a thing that we don't have a good upstream test plan on and put it in a release so that the testing will happen only in the downstream case. The objective reality is that many other things have not had upstream testing for a long time (anything that requires more than 1 compute node in Nova for example, and any scheduling feature - as I mention clearly above), so not sure how that is backwards from any reasonable point. Thanks to folks using them, it is still kept working and bugs get fixed. Getting features into the hands of users is extremely important... Anyway, not enough to -1 it, but enough to at least say something. .. but I do not want to get into the discussion about software testing here, not the place really. However, I do think it is very harmful to respond to FFE request with such blanket statements and generalizations, if only for the message it sends to the contributors (that we really care more about upholding our own myths as a community than users and features). I believe you brought this up as one of your justifications for the FFE. When I read your statement it does sound as though you want to put experimental code in at the final release. I am sure that is not what you had in mind, but I am also sure you can also understand Sean's point of view. His point is clear and pertinent to your request. As the person responsible for Nova in HP I will be interested to see how it operates in practice. I can assure you we will do extensive testing on it before it goes into the wild and we will not put it into practice if we are not happy. That is awesome and we as a project are lucky to have that! I would not want things put into practice that users can't use or see huge flaws with. I can't help but read this as you being OK with the feature going ahead, though :). N. Paul Paul Murray Nova Technical Lead, HP Cloud +44 117 312 9309 Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as HP CONFIDENTIAL. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 10:18:04AM -0500, Matt Riedemann wrote: - Changes submitted to nova common code would trigger running of CI tests against the external virt drivers. Each virt driver core team would decide whether they want their driver to be tested upon Nova common changes. Expect that all would choose to be included to the same extent that they are today. So level of validation of nova code would remain at least at current level. I don't want to reduce the amount of code testing here since that's contrary to the direction we're taking wrt testing. - Changes submitted to virt drivers would trigger running CI tests that are applicable. eg changes to libvirt driver repo would not involve running database migration tests, since all database code is isolated in nova. libvirt changes would not trigger vmware, xenserver, ironic, etc CI systems. Virt driver changes should see fewer false positives in the tests as a result, and those that do occur should be more explicitly related to the code being proposed. eg a change to vmware is not going to trigger a tempest run that uses libvirt, so non-deterministic failures in libvirt will no longer plague vmware developers reviews. This would also make it possible for VMWare CI to be made gating for changes to the VMWare virt driver repository, without negatively impacting other virt drivers. So this change should increase testing quality for non-libvirt virt drivers and reduce pain of false failures for everyone. [snip] Even if we split the virt drivers out, libvirt would still be the default in the Tempest gate runs right? Yes, what I'm calling the nova common repository would still need to have a tempest job that was gating on at least one virt driver as a sanity check. As mentioned above, I'd pretty much expect that all current tempest jobs for nova common code would continue unchanged. IOW, a libvirt job would still be gating, and there'd still be a number of 3rd party CIs for other virt drivers non-gating too. The only change in testing jobs would be wrt to the new git repos for the individual virt drivers. Those would be only running jobs directly related to the code in those repos. it vmware is tested by a vmware CI job and libvirt is tested by a libvirt CI job. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [sahara] integration tests in python-saharaclient
Trevor, by the way, what typo? https://review.openstack.org/#/c/118903/ Andrew. On Thu, Sep 4, 2014 at 7:58 AM, Trevor McKay tmc...@redhat.com wrote: by the way, what typo? Trev On Wed, 2014-09-03 at 14:58 -0700, Andrew Lazarev wrote: Hi team, Today I've realized that we have some tests called 'integration' in python-saharaclient. Also I've found out that Jenkins doesn't use them and they can't be run starting from April because of typo in tox.ini. Does anyone know what these tests are? Does anyone mind if I delete them since we don't use them anyway? Thanks, Andrew. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Tripleo] Release Report
1. os-apply-config: release: 0.1.19 -- 0.1.20 -- https://pypi.python.org/pypi/os-apply-config/0.1.20 -- http://tarballs.openstack.org/os-apply-config/os-apply-config-0.1.20.tar.gz 2. os-refresh-config: no changes, 0.1.7 3. os-collect-config: release: 0.1.27 -- 0.1.28 -- https://pypi.python.org/pypi/os-collect-config/0.1.28 -- http://tarballs.openstack.org/os-collect-config/os-collect-config-0.1.28.tar.gz 4. os-cloud-config: release: 0.1.7 -- 0.1.8 -- https://pypi.python.org/pypi/os-cloud-config/0.1.8 -- http://tarballs.openstack.org/os-cloud-config/os-cloud-config-0.1.8.tar.gz 5. diskimage-builder: release: 0.1.28 -- 0.1.29 -- https://pypi.python.org/pypi/diskimage-builder/0.1.29 -- http://tarballs.openstack.org/diskimage-builder/diskimage-builder-0.1.29.tar.gz 6. dib-utils: release: 0.0.5 -- 0.0.6 -- https://pypi.python.org/pypi/dib-utils/0.0.6 -- http://tarballs.openstack.org/dib-utils/dib-utils-0.0.6.tar.gz 7. tripleo-heat-templates: release: 0.7.4 -- 0.7.5 -- https://pypi.python.org/pypi/tripleo-heat-templates/0.7.5 -- http://tarballs.openstack.org/tripleo-heat-templates/tripleo-heat-templates-0.7.5.tar.gz 8. tripleo-image-elements: release: 0.8.4 -- 0.8.5 -- https://pypi.python.org/pypi/tripleo-image-elements/0.8.5 -- http://tarballs.openstack.org/tripleo-image-elements/tripleo-image-elements-0.8.5.tar.gz 9. tuskar: release 0.4.9 -- 0.4.10 -- https://pypi.python.org/pypi/tuskar/0.4.10 -- http://tarballs.openstack.org/tuskar/tuskar-0.4.10.tar.gz 10. python-tuskarclient:release 0.1.9 -- 0.1.10 -- https://pypi.python.org/pypi/python-tuskarclient/0.1.10 -- http://tarballs.openstack.org/python-tuskarclient/python-tuskarclient-0.1.10.tar.gz ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
+1 I very much agree with Dan's the propsal. I am concerned about difficulties we will face with merging patches that spreads accross various regions: manager, conductor, scheduler, etc.. However, I think, this is a small price to pay for having a more focused teams. IMO, we will stiil have to pay it, the moment the scheduler will separate. Regards, Vladik ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev