Re: [openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options
On Fri, 2016-01-08 at 14:11 +, Daniel P. Berrange wrote: > On Thu, Jan 07, 2016 at 09:07:00PM +0000, Mark McLoughlin wrote: > > On Thu, 2016-01-07 at 12:23 +0100, Sahid Orentino Ferdjaoui wrote: > > > On Mon, Jan 04, 2016 at 09:12:06PM +0000, Mark McLoughlin wrote: > > > > Hi > > > > > > > > commit 8ecf93e[1] got me thinking - the live_migration_flag config > > > > option unnecessarily allows operators choose arbitrary behavior of the > > > > migrateToURI() libvirt call, to the extent that we allow the operator > > > > to configure a behavior that can result in data loss[1]. > > > > > > > > I see that danpb recently said something similar: > > > > > > > > https://review.openstack.org/171098 > > > > > > > > "Honestly, I wish we'd just kill off 'live_migration_flag' and > > > > 'block_migration_flag' as config options. We really should not be > > > > exposing low level libvirt API flags as admin tunable settings. > > > > > > > > Nova should really be in charge of picking the correct set of flags > > > > for the current libvirt version, and the operation it needs to > > > > perform. We might need to add other more sensible config options in > > > > their place [..]" > > > > > > Nova should really handle internal flags and this serie is running in > > > the right way. > > > > > > > ... > > > > > > > 4) Add a new config option for tunneled versus native: > > > > > > > > [libvirt] > > > > live_migration_tunneled = true > > > > > > > > This enables the use of the VIR_MIGRATE_TUNNELLED flag. We have > > > > historically defaulted to tunneled mode because it requires the > > > > least configuration and is currently the only way to have a > > > > secure migration channel. > > > > > > > > danpb's quote above continues with: > > > > > > > > "perhaps a "live_migration_secure_channel" to indicate that > > > > migration must use encryption, which would imply use of > > > > TUNNELLED flag" > > > > > > > > So we need to discuss whether the config option should express the > > > > choice of tunneled vs native, or whether it should express another > > > > choice which implies tunneled vs native. > > > > > > > > https://review.openstack.org/263434 > > > > > > We probably have to consider that operator does not know much about > > > internal libvirt flags, so options we are exposing for him should > > > reflect benefice of using them. I commented on your review we should > > > at least explain benefice of using this option whatever the name is. > > > > As predicted, plenty of discussion on this point in the review :) > > > > You're right that we don't give the operator any guidance in the help > > message about how to choose true or false for this: > > > > Whether to use tunneled migration, where migration data is > > transported over the libvirtd connection. If True, > > we use the VIR_MIGRATE_TUNNELLED migration flag > > > > libvirt's own docs on this are here: > > > > https://libvirt.org/migration.html#transport > > > > which emphasizes: > > > > - the data copies involved in tunneling > > - the extra configuration steps required for native > > - the encryption support you get when tunneling > > > > The discussions I've seen on this topic wrt Nova have revolved around: > > > > - that tunneling allows for an encrypted transport[1] > > - that qemu's NBD based drive-mirror block migration isn't supported > > using tunneled mode, and that danpb is working on fixing this > > limitation in libvirt > > - "selective" block migration[2] won't work with the fallback qemu > > block migration support, and so won't currently work in tunneled > > mode > > I'm not working on fixing it, but IIRC some other dev had proposed > patches. > > > > > So, the advise to operators would be: > > > > - You may want to choose tunneled=False for improved block migration > > capabilities, but this limitation will go away in future. > > - You may want to choose tunneled=False if you wish to trade and > > encrypted tra
Re: [openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options
On Thu, 2016-01-07 at 12:23 +0100, Sahid Orentino Ferdjaoui wrote: > On Mon, Jan 04, 2016 at 09:12:06PM +0000, Mark McLoughlin wrote: > > Hi > > > > commit 8ecf93e[1] got me thinking - the live_migration_flag config > > option unnecessarily allows operators choose arbitrary behavior of the > > migrateToURI() libvirt call, to the extent that we allow the operator > > to configure a behavior that can result in data loss[1]. > > > > I see that danpb recently said something similar: > > > > https://review.openstack.org/171098 > > > > "Honestly, I wish we'd just kill off 'live_migration_flag' and > > 'block_migration_flag' as config options. We really should not be > > exposing low level libvirt API flags as admin tunable settings. > > > > Nova should really be in charge of picking the correct set of flags > > for the current libvirt version, and the operation it needs to > > perform. We might need to add other more sensible config options in > > their place [..]" > > Nova should really handle internal flags and this serie is running in > the right way. > > > ... > > > 4) Add a new config option for tunneled versus native: > > > > [libvirt] > > live_migration_tunneled = true > > > > This enables the use of the VIR_MIGRATE_TUNNELLED flag. We have > > historically defaulted to tunneled mode because it requires the > > least configuration and is currently the only way to have a > > secure migration channel. > > > > danpb's quote above continues with: > > > > "perhaps a "live_migration_secure_channel" to indicate that > > migration must use encryption, which would imply use of > > TUNNELLED flag" > > > > So we need to discuss whether the config option should express the > > choice of tunneled vs native, or whether it should express another > > choice which implies tunneled vs native. > > > > https://review.openstack.org/263434 > > We probably have to consider that operator does not know much about > internal libvirt flags, so options we are exposing for him should > reflect benefice of using them. I commented on your review we should > at least explain benefice of using this option whatever the name is. As predicted, plenty of discussion on this point in the review :) You're right that we don't give the operator any guidance in the help message about how to choose true or false for this: Whether to use tunneled migration, where migration data is transported over the libvirtd connection. If True, we use the VIR_MIGRATE_TUNNELLED migration flag libvirt's own docs on this are here: https://libvirt.org/migration.html#transport which emphasizes: - the data copies involved in tunneling - the extra configuration steps required for native - the encryption support you get when tunneling The discussions I've seen on this topic wrt Nova have revolved around: - that tunneling allows for an encrypted transport[1] - that qemu's NBD based drive-mirror block migration isn't supported using tunneled mode, and that danpb is working on fixing this limitation in libvirt - "selective" block migration[2] won't work with the fallback qemu block migration support, and so won't currently work in tunneled mode So, the advise to operators would be: - You may want to choose tunneled=False for improved block migration capabilities, but this limitation will go away in future. - You may want to choose tunneled=False if you wish to trade and encrypted transport for a (potentially negligible) performance improvement. Does that make sense? As for how to name the option, and as I said in the review, I think it makes sense to be straightforward here and make it clearly about choosing to disable libvirt's tunneled transport. If we name it any other way, I think our explanation for operators will immediately jump to explaining (a) that it influences the TUNNELLED flag, and (b) the differences between the tunneled and native transports. So, if we're going to have to talk about tunneled versus native, why obscure that detail? But, Pawel strongly disagrees. One last point I'd make is this isn't about adding a *new* configuration capability for operators. As we deprecate and remove these configuration options, we need to be careful not to remove a capability that operators are currently depending on for arguably reasonable reasons. [1] - https://review.openstack.org/#/c/171098/ [2] - https://review.openstack.org/#/c/227278 > > 5) Add a new config option for additional migration flags: > > > > [libvirt] > >
[openstack-dev] [nova][libvirt] Deprecating the live_migration_flag and block_migration_flag config options
Hi commit 8ecf93e[1] got me thinking - the live_migration_flag config option unnecessarily allows operators choose arbitrary behavior of the migrateToURI() libvirt call, to the extent that we allow the operator to configure a behavior that can result in data loss[1]. I see that danpb recently said something similar: https://review.openstack.org/171098 "Honestly, I wish we'd just kill off 'live_migration_flag' and 'block_migration_flag' as config options. We really should not be exposing low level libvirt API flags as admin tunable settings. Nova should really be in charge of picking the correct set of flags for the current libvirt version, and the operation it needs to perform. We might need to add other more sensible config options in their place [..]" I've just proposed a series of patches, which boils down to the following steps: 1) Modify the approach taken in commit 8ecf93e so that instead of just warning about unsafe use of NON_SHARED_INC, we fix up the config option to a safe value. https://review.openstack.org/263431 2) Hard-code the P2P flag for live and block migrations as appropriate for the libvirt driver being used. For the qemu driver, We should always use VIR_MIGRATE_PEER2PEER both live and block migrations. Without this option, you get: Live Migration failure: Requested operation is not valid: direct migration is not supported by the connection driver OTOH, the Xen driver does not support P2P, and only supports "unmanaged direct connection". https://review.openstack.org/263432 3) Require the use of the UNDEFINE_SOURCE flag, and the non-use of the PERSIST_DEST flag. Nova itself persists the domain configuration on the destination host, but it assumes the libvirt migration call removes it from the source host. So it makes no sense to allow operators configure these flags. https://review.openstack.org/263433 4) Add a new config option for tunneled versus native: [libvirt] live_migration_tunneled = true This enables the use of the VIR_MIGRATE_TUNNELLED flag. We have historically defaulted to tunneled mode because it requires the least configuration and is currently the only way to have a secure migration channel. danpb's quote above continues with: "perhaps a "live_migration_secure_channel" to indicate that migration must use encryption, which would imply use of TUNNELLED flag" So we need to discuss whether the config option should express the choice of tunneled vs native, or whether it should express another choice which implies tunneled vs native. https://review.openstack.org/263434 5) Add a new config option for additional migration flags: [libvirt] live_migration_extra_flags = VIR_MIGRATE_COMPRESSED This allows operators to continue to experiment with libvirt behaviors in safe ways without each use case having to be accounted for. https://review.openstack.org/263435 We would disallow setting the following flags via this option: VIR_MIGRATE_LIVE VIR_MIGRATE_PEER2PEER VIR_MIGRATE_TUNNELLED VIR_MIGRATE_PERSIST_DEST VIR_MIGRATE_UNDEFINE_SOURCE VIR_MIGRATE_NON_SHARED_INC VIR_MIGRATE_NON_SHARED_DISK which would allow the following currently available flags to be set: VIR_MIGRATE_PAUSED VIR_MIGRATE_CHANGE_PROTECTION VIR_MIGRATE_UNSAFE VIR_MIGRATE_OFFLINE VIR_MIGRATE_COMPRESSED VIR_MIGRATE_ABORT_ON_ERROR VIR_MIGRATE_AUTO_CONVERGE VIR_MIGRATE_RDMA_PIN_ALL 6) Deprecate the existing live_migration_flag and block_migration_flag config options. Operators would be expected to migrate to using the live_migration_tunneled or live_migration_extra_flags config options. During the deprecation period we would invite feedback as to whether additional config options are needed to cover unanticipated use cases. https://review.openstack.org/263436 Thanks in advance for any feedback. I'm going to guess that one piece of feedback will be that some subset of this needs a blueprint (and maybe a spec), and that the blueprint freeze was a month ago, so that subset needs to be punted until after Mitaka? I'd love to be wrong about that, though :) Thanks, Mark. [1] - https://review.openstack.org/228853 [2] - Data loss can occur when you have disk images on shared storage and you specify the VIR_MIGRATE_NON_SHARED_INC or VIR_MIGRATE_NON_SHARED_DISK because during the block migration the disk is copied back over itself while it is in use from another node. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
Re: [openstack-dev] [puppet] naming of the project
Hi Emilien, On Fri, 2015-04-17 at 10:52 -0400, Emilien Macchi wrote: On 04/16/2015 02:32 PM, Emilien Macchi wrote: On 04/16/2015 02:23 PM, Richard Raseley wrote: Emilien Macchi wrote: Hi all, I sent a patch to openstack/governance to move our project under the big tent, and it came up [1] that we should decide of a project name and be careful about trademarks issues with Puppet name. I would like to hear from Puppetlabs if there is any issue to use Puppet in the project title; also, I open a new etherpad so people can suggest some names: https://etherpad.openstack.org/p/puppet-openstack-naming Thanks, [1] https://review.openstack.org/#/c/172112/1/reference/projects.yaml,cm Emilien, I went ahead and had a discussion with Puppet's legal team on this issue. Unfortunately at this time we are unable to sanction the use of Puppet's name or registered trademarks as part of the project's name. To be clear, this decision is in no way indicative of Puppet not feeling the project is 'worthy' or 'high quality' (in fact the opposite is true), but rather is a purely defensive decision. We are in the process of reevaluating our usage guidelines, but there is no firm timetable as of this moment. I guess our best option is to choose a name without Puppet in the title. We will proceed to a vote after all proposals on the etherpad. While we hear from Puppetlabs about the trademark potential issue, I would like to run a vote for a name that does not contain `Puppet`, so we can go ahead on the governance thing. I took all proposals on the etherpad [1] and created a poll that will close on next Tuesday 3pm, just before our weekly meeting so we will make it official. Anyone is welcome to vote: http://civs.cs.cornell.edu/cgi-bin/vote.pl?id=E_6c81ad92b71422d6akey=f2e85294f17caa9a Any feedback on the vote itself is also welcome. Thanks, [1] https://etherpad.openstack.org/p/puppet-openstack-naming Another idea on this ... a number of OpenStack projects have purely descriptive names (and no 'service' attribute), for example Infrastructure, Documentation, Security, Quality Assurance, and Release Cycle Management. Simply calling the project OpenStack Puppet Modules would follow that pattern, and a straightforward descriptive use of the Puppet name may not be objectionable. Mark. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] interesting problem with config filter
Hi Doug, On Mon, 2014-12-08 at 15:58 -0500, Doug Hellmann wrote: As we’ve discussed a few times, we want to isolate applications from the configuration options defined by libraries. One way we have of doing that is the ConfigFilter class in oslo.config. When a regular ConfigOpts instance is wrapped with a filter, a library can register new options on the filter that are not visible to anything that doesn’t have the filter object. Or to put it more simply, the configuration options registered by the library should not be part of the public API of the library. Unfortunately, the Neutron team has identified an issue with this approach. We have a bug report [1] from them about the way we’re using config filters in oslo.concurrency specifically, but the issue applies to their use everywhere. The neutron tests set the default for oslo.concurrency’s lock_path variable to “$state_path/lock”, and the state_path option is defined in their application. With the filter in place, interpolation of $state_path to generate the lock_path value fails because state_path is not known to the ConfigFilter instance. It seems that Neutron sets this default in its etc/neutron.conf file in its git tree: lock_path = $state_path/lock I think we should be aiming for defaults like this to be set in code, and for the sample config files to contain nothing but comments. So, neutron should do: lockutils.set_defaults(lock_path=$state_path/lock) That's a side detail, however. The reverse would also happen (if the value of state_path was somehow defined to depend on lock_path), This dependency wouldn't/shouldn't be code - because Neutron *code* shouldn't know about the existence of library config options. Neutron deployers absolutely will be aware of lock_path however. and that’s actually a bigger concern to me. A deployer should be able to use interpolation anywhere, and not worry about whether the options are in parts of the code that can see each other. The values are all in one file, as far as they know, and so interpolation should “just work”. Yes, if a deployer looks at a sample configuration file, all options listed in there seem like they're in-play for substitution use within the value of another option. For string substitution only, I'd say there should be a global namespace where all options are registered. Now ... one caveat on all of this ... I do think the string substitution feature is pretty obscure and mostly just used in default values. I see a few solutions: 1. Don’t use the config filter at all. 2. Make the config filter able to add new options and still see everything else that is already defined (only filter in one direction). 3. Leave things as they are, and make the error message better. 4. Just tackle this specific case by making lock_path implicitly relative to a base path the application can set via an API, so Neutron would do: lockutils.set_base_path(CONF.state_path) at startup. 5. Make the toplevel ConfigOpts aware of all filters hanging off it, and somehow cycle through all of those filters just when doing string substitution. Because of the deployment implications of using the filter, I’m inclined to go with choice 1 or 2. However, choice 2 leaves open the possibility of a deployer wanting to use the value of an option defined by one filtered set of code when defining another. I don’t know how frequently that might come up, but it seems like the error would be very confusing, especially if both options are set in the same config file. I think that leaves option 1, which means our plans for hiding options from applications need to be rethought. Does anyone else see another solution that I’m missing? I'd do something like (3) and (4), then wait to see if it crops up multiple times in the future before tackling a more general solution. With option (1), the basic thing to think about is how to maintain API compatibility - if we expose the options through the API, how do we deal with future moves, removals, renames, and changing semantics of those config options. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Thoughts on OpenStack Layers and a Big Tent model
Hey On Thu, 2014-09-18 at 11:53 -0700, Monty Taylor wrote: Hey all, I've recently been thinking a lot about Sean's Layers stuff. So I wrote a blog post which Jim Blair and Devananda were kind enough to help me edit. http://inaugust.com/post/108 Lots of great stuff here, but too much to respond to everything in detail. I love the way you've framed this in terms of the needs of developers, distributors, deployers and end users. I'd like to make sure we're focused on tackling those places where we're failing these groups, so: - Developers I think we're catering pretty well to developers with the big tent concept of Programs. There's been some good discussion about how Programs could be better at embracing projects in their related area, and that would be great to pursue. But the general concept - of endorsing and empowering teams of people collaborating in the OpenStack way - has a lot of legs, I think. I also think our release cadence does a pretty good job of serving developers. We've talked many times about the benefit of it, and I'd like to see it applied to more than just the server projects. OTOH, the integrated gate is straining, and a source of frustration for everyone. You raise the question of whether everything currently in the integrated release needs to be co-gated, and I totally agree that needs re-visiting. - Distributors We may be doing a better job of catering to distributors than any other group. For example, things like the release cadence, stable branch and common set of dependencies works pretty well. . The concept of an integrated release (with an incubation process) is great, because it nicely defines a set of stuff that distributors should ship. Certainly, life would be more difficult for distributors if there was a smaller set of projects in the release and a whole bunch of other projects which are interesting to distro users, but with an ambiguous level of commitment from our community. Right now, our integration process has a huge amount of influence over what gets shipped by distros, and that in turn serves distro users by ensuring a greater level of commonality between distros. - Operators I think the feedback we've been getting over the past few cycles suggests we are failing this group the most. Operators want to offer a compelling set of services to their users, but they want those services to be stable, performant, and perhaps most importantly, cost-effective. No operator wants to have to invest a huge amount of time in getting a new service running. You suggest a Production Ready tag. Absolutely - our graduation of projects has been interpreted as meaning production ready, when it's actually more useful as a signal to distros rather than operators. Graduation does not necessarily imply that a service is ready for production, no matter how you define production. I'd like to think we could give more nuanced advice to operators than a simple tag, but perhaps the information gathering process that projects would need to go through to be awarded that tag would uncover the more detailed information for operators. You could question whether the TC is the right body for this process. How might it work if the User Committee owned this? There are many other ways we can and should help operators, obviously, but this setting expectations is the aspect most relevant to this discussion. - End users You're right that we don't pay sufficient attention to this group. For me, the highest priority challenge here is interoperability. Particularly interoperability between public clouds. The only real interop effort to date you can point to is the board-owned DefCore and RefStack efforts. The idea being that a trademark program with API testing requirements will focus minds on interoperability. I'd love us (as a community) to be making more rapid progress on interoperability, but at least there are no encouraging signs that we should make some definite progress soon. Your end-user focused concrete suggestions (#7-#10) are interesting, and I find myself thinking about how much of a positive effect on interop each of them would have. For example, making our tools multi-cloud aware would help encourage people to demand interop from their providers. I also agree that end-user tools should support older versions of our APIs, but don't think that necessarily implies rolling releases. So, if I was to pick the areas which I think would address our most pressing challenges: 1) Shrinking the integrated gate, and allowing per-project testing strategies other than shoving every integrated project into the gate. 2) Giving more direction to operators about the readiness of our projects for different use cases. A process around awarding
Re: [openstack-dev] [Zaqar] Zaqar graduation (round 2) [was: Comments on the concerns arose during the TC meeting]
On Wed, 2014-09-10 at 14:51 +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] Based on the feedback from the meeting[3], the current main concern is: - Do we need a messaging service with a feature-set akin to SQS+SNS? [...] I think we do need, as Samuel puts it, some sort of durable message-broker/queue-server thing. It's a basic application building block. Some claim it's THE basic application building block, more useful than database provisioning. It's definitely a layer above pure IaaS, so if we end up splitting OpenStack into layers this clearly won't be in the inner one. But I think IaaS+ basic application building blocks belong in OpenStack one way or another. That's the reason I supported Designate (everyone needs DNS) and Trove (everyone needs DBs). With that said, I think yesterday there was a concern that Zaqar might not fill the some sort of durable message-broker/queue-server thing role well. The argument goes something like: if it was a queue-server then it should actually be built on top of Rabbit; if it was a message-broker it should be built on top of postfix/dovecot; the current architecture is only justified because it's something in between, so it's broken. I guess I don't mind that much zaqar being something in between: unless I misunderstood, exposing extra primitives doesn't prevent the queue-server use case from being filled. Even considering the message-broker case, I'm also not convinced building it on top of postfix/dovecot would be a net win compared to building it on top of Redis, to be honest. AFAICT, this part of the debate boils down to the following argument: If Zaqar implemented messaging-as-a-service with only queuing semantics (and no random access semantics), it's design would naturally be dramatically different and simply implement a multi-tenant REST API in front of AMQP queues like this: https://www.dropbox.com/s/yonloa9ytlf8fdh/ZaqarQueueOnly.png?dl=0 and that this architecture would allow for dramatically improved throughput for end-users while not making the cost of providing the service prohibitive to operators. You can't dismiss that argument out-of-hand, but I wonder (a) whether the claimed performance improvement is going to make a dramatic difference to the SQS-like use case and (b) whether backing this thing with an RDBMS and multiple highly available, durable AMQP broker clusters is going to be too much of a burden on operators for whatever performance improvements it does gain. But the troubling part of this debate is where we repeatedly batter the Zaqar team with hypotheses like these and appear to only barely entertain their carefully considered justification for their design decisions like: https://wiki.openstack.org/wiki/Frequently_asked_questions_%28Zaqar%29#Is_Zaqar_a_provisioning_service_or_a_data_API.3F https://wiki.openstack.org/wiki/Frequently_asked_questions_%28Zaqar%29#What_messaging_patterns_does_Zaqar_support.3F I would like to see an SQS-like API provided by OpenStack, I accept the reasons for Zaqar's design decisions to date, I respect that those decisions were made carefully by highly competent members of our community and I expect Zaqar to evolve (like all projects) in the years ahead based on more real-world feedback, new hypotheses or ideas, and lessons learned from trying things out. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
On Wed, 2014-09-10 at 12:46 -0700, Monty Taylor wrote: On 09/09/2014 07:04 PM, Samuel Merritt wrote: On 9/9/14, 4:47 PM, Devananda van der Veen wrote: The questions now before us are: - should OpenStack include, in the integrated release, a messaging-as-a-service component? I certainly think so. I've worked on a few reasonable-scale web applications, and they all followed the same pattern: HTTP app servers serving requests quickly, background workers for long-running tasks, and some sort of durable message-broker/queue-server thing for conveying work from the first to the second. A quick straw poll of my nearby coworkers shows that every non-trivial web application that they've worked on in the last decade follows the same pattern. While not *every* application needs such a thing, web apps are quite common these days, and Zaqar satisfies one of their big requirements. Not only that, it does so in a way that requires much less babysitting than run-your-own-broker does. Right. But here's the thing. What you just described is what we all thought zaqar was aiming to be in the beginning. We did not think it was a GOOD implementation of that, so while we agreed that it would be useful to have one of those, we were not crazy about the implementation. Those generalizations are uncomfortably sweeping. What Samuel just described is one of the messaging patterns that Zaqar implements and some (members of the TC?) believed that this messaging pattern was the only pattern that Zaqar aimed to implement. Some (members of the TC?) formed strong, negative opinions about how this messaging pattern was implemented, but some/all of those same people agreed a messaging API implementing those semantics would be a useful thing to have. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] libvirt version_cap, a postmortem
Hey The libvirt version_cap debacle continues to come up in conversation and one perception of the whole thing appears to be: A controversial patch was ninjaed by three Red Hat nova-cores and then the same individuals piled on with -2s when a revert was proposed to allow further discussion. I hope it's clear to everyone why that's a pretty painful thing to hear. However, I do see that I didn't behave perfectly here. I apologize for that. In order to understand where this perception came from, I've gone back over the discussions spread across gerrit and the mailing list in order to piece together a precise timeline. I've appended that below. Some conclusions I draw from that tedious exercise: - Some people came at this from the perspective that we already have a firm, unwritten policy that all code must have functional written tests. Others see that test all the things is interpreted as a worthy aspiration, but is only one of a number of nuanced factors that needs to be taken into account when considering the addition of a new feature. i.e. the former camp saw Dan Smith's devref addition as attempting to document an existing policy (perhaps even a more forgiving version of an existing policy), whereas other see it as a dramatic shift to a draconian implementation of test all the things. - Dan Berrange, Russell and I didn't feel like we were ninjaing a controversial patch - you can see our perspective expressed in multiple places. The patch would have helped the live snapshot issue, and has other useful applications. It does not affect the broader testing debate. Johannes was a solitary voice expressing concerns with the patch, and you could see that Dan was particularly engaged in trying to address those concerns and repeating his feeling that the patch was orthogonal to the testing debate. That all being said - the patch did merge too quickly. - What exacerbates the situation - particularly when people attempt to look back at what happened - is how spread out our conversations are. You look at the version_cap review and don't see any of the related discussions on the devref policy review nor the mailing list threads. Our disjoint methods of communicating contribute to misunderstandings. - When it came to the revert, a couple of things resulted in misunderstandings, hurt feelings and frayed tempers - (a) that our retrospective veto revert policy wasn't well understood and (b) a feeling that there was private, in-person grumbling about us at the mid-cycle while we were absent, with no attempt to talk to us directly. To take an even further step back - successful communities like ours require a huge amount of trust between the participants. Trust requires communication and empathy. If communication breaks down and the pressure we're all under erodes our empathy for each others' positions, then situations can easily get horribly out of control. This isn't a pleasant situation and we should all strive for better. However, I tend to measure our flamewars against this: https://mail.gnome.org/archives/gnome-2-0-list/2001-June/msg00132.html GNOME in June 2001 was my introduction to full-time open-source development, so this episode sticks out in my mind. The two individuals in that email were/are immensely capable and reasonable people, yet ... So far, we're doing pretty okay compared to that and many other open-source flamewars. Let's make sure we continue that way by avoiding letting situations fester. Thanks, and sorry for being a windbag, Mark. --- = July 1 = The starting point is this review: https://review.openstack.org/103923 Dan Smith proposes a policy that the libvirt driver may not use libvirt features until they have been available in Ubuntu or Fedora for at least 30 days. The commit message mentions: broken us in the past when we add a new feature that requires a newer libvirt than we test with, and we discover that it's totally broken when we upgrade in the gate. which AIUI is a reference to the libvirt live snapshot issue the previous week, which is described here: https://review.openstack.org/102643 where upgrading to Ubuntu Trusty meant the libvirt version in use in the gate went from 0.9.8 to 1.2.2, which caused the live snapshot code paths in Nova for the first time, which appeared to be related to some serious gate instability (although the exact root cause wasn't identified). Some background on the libvirt version upgrade can be seen here: http://lists.openstack.org/pipermail/openstack-dev/2014-March/thread.html#30284 = July 1 - July 8 = Back and forth debate mostly between Dan Smith and Dan Berrange. Sean votes +2, Dan Berrange votes -2. = July 14 = Russell adds his support to Dan Berrange's position, votes -2. Some debate between Dan and Dan continues. Joe Gordon votes +2. Matt Riedemann expresses support-in-principal for
Re: [openstack-dev] [oslo.messaging] Request to include AMQP 1.0 support in Juno-3
On Thu, 2014-08-28 at 13:24 +0200, Flavio Percoco wrote: On 08/27/2014 03:35 PM, Ken Giusti wrote: Hi All, I believe Juno-3 is our last chance to get this feature [1] included into olso.messaging. I honestly believe this patch is about as low risk as possible for a change that introduces a whole new transport into oslo.messaging. The patch shouldn't affect the existing transports at all, and doesn't come into play unless the application specifically turns on the new 'amqp' transport, which won't be the case for existing applications. The patch includes a set of functional tests which exercise all the messaging patterns, timeouts, and even broker failover. These tests do not mock out any part of the driver - a simple test broker is included which allows the full driver codepath to be executed and verified. IFAIK, the only remaining technical block to adding this feature, aside from core reviews [2], is sufficient infrastructure test coverage. We discussed this a bit at the last design summit. The root of the issue is that this feature is dependent on a platform-specific library (proton) that isn't in the base repos for most of the CI platforms. But it is available via EPEL, and the Apache QPID team is actively working towards getting the packages into Debian (a PPA is available in the meantime). In the interim I've proposed a non-voting CI check job that will sanity check the new driver on EPEL based systems [3]. I'm also working towards adding devstack support [4], which won't be done in time for Juno but nevertheless I'm making it happen. I fear that this feature's inclusion is stuck in a chicken/egg deadlock: the driver won't get merged until there is CI support, but the CI support won't run correctly (and probably won't get merged) until the driver is available. The driver really has to be merged first, before I can continue with CI/devstack development. [1] https://blueprints.launchpad.net/oslo.messaging/+spec/amqp10-driver-implementation [2] https://review.openstack.org/#/c/75815/ [3] https://review.openstack.org/#/c/115752/ [4] https://review.openstack.org/#/c/109118/ Hi Ken, Thanks a lot for your hard work here. As I stated in my last comment on the driver's review, I think we should let this driver land and let future patches improve it where/when needed. I agreed on letting the driver land as-is based on the fact that there are patches already submitted ready to enable the gates for this driver. I feel bad that the driver has been in a pretty complete state for quite a while but hasn't received a whole lot of reviews. There's a lot of promise to this idea, so it would be ideal if we could unblock it. One thing I've been meaning to do this cycle is add concrete advice for operators on the state of each driver. I think we'd be a lot more comfortable merging this in Juno if we could somehow make it clear to operators that it's experimental right now. My idea was: - Write up some notes which discusses the state of each driver e.g. - RabbitMQ - the default, used by the majority of OpenStack deployments, perhaps list some of the known bugs, particularly around HA. - Qpid - suitable for production, but used in a limited number of deployments. Again, list known issues. Mention that it will probably be removed with the amqp10 driver matures. - Proton/AMQP 1.0 - experimental, in active development, will support multiple brokers and topologies, perhaps a pointer to a wiki page with the current TODO list - ZeroMQ - unmaintained and deprecated, planned for removal in Kilo - Propose this addition to the API docs and ask the operators list for feedback - Propose a patch which adds a load-time deprecation warning to the ZeroMQ driver - Include a load-time experimental warning in the proton driver Thoughts on that? (I understand the ZeroMQ situation needs further discussion - I don't think that's on-topic for the thread, I was just using it as example of what kind of advice we'd be giving in these docs) Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] usage patterns for oslo.config
On Tue, 2014-08-26 at 10:00 -0400, Doug Hellmann wrote: On Aug 26, 2014, at 6:30 AM, Mark McLoughlin mar...@redhat.com wrote: On Mon, 2014-08-11 at 15:06 -0400, Doug Hellmann wrote: On Aug 8, 2014, at 7:22 PM, Devananda van der Veen devananda@gmail.com wrote: On Fri, Aug 8, 2014 at 12:41 PM, Doug Hellmann d...@doughellmann.com wrote: That’s right. The preferred approach is to put the register_opt() in *runtime* code somewhere before the option will be used. That might be in the constructor for a class that uses an option, for example, as described in http://docs.openstack.org/developer/oslo.config/cfg.html#registering-options Doug Interesting. I've been following the prevailing example in Nova, which is to register opts at the top of a module, immediately after defining them. Is there a situation in which one approach is better than the other? The approach used in Nova is the “old” way of doing it. It works, but assumes that all of the application code is modifying a global configuration object. The runtime approach allows you to pass a configuration object to a library, which makes it easier to mock the configuration for testing and avoids having the configuration options bleed into the public API of the library. We’ve started using the runtime approach in new Oslo libraries that have configuration options, but changing the implementation in existing application code isn’t strictly necessary. I've been meaning to dig up some of the old threads and reviews to document how we got here. But briefly: * this global CONF variable originates from the gflags FLAGS variable in Nova before oslo.config * I was initially determined to get rid of any global variable use and did a lot of work to allow glance use oslo.config without a global variable * one example detail of this work - when you use paste.deploy to load an app, you have no ability to pass a config object through paste.deploy to the app. I wrote a little helper that used a thread-local variable to mimic this pass-through. * with glance done, I moved on to making keystone use oslo.config and initially didn't use the global variable. Then I ran into a veto from termie who felt very strongly that a global variable should be used. * in the end, I bought the argument that the use of a global variable was pretty deeply ingrained (especially in Nova) and that we should aim for consistent coding patterns across projects (i.e. Oslo shouldn't be just about shared code, but also shared patterns). The only realistic standard pattern we could hope for was the use of the global variable. * with that agreed, we reverted glance back to using a global variable and all projects followed suit * the case of libraries is different IMO - we'd be foolish to design APIs which lock us into using the global object So ... I wouldn't quite agree that this is the new way vs the old way, but I think it would be reasonable to re-open the discussion about using the global object in our applications. Perhaps, at least, we could reduce our dependence on it. The aspect I was calling “old” was the “register options at import time” pattern, not the use of a global. Whether we use a global or not, registering options at runtime in a code path that will be using them is better than relying on import ordering to ensure options are registered before they are used. I don't think I've seen code (except for obscure cases) which uses the CONF global directly (as opposed to being passed CONF as a parameter) but doesn't register the options at import time. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Launchpad tracking of oslo projects
On Fri, 2014-08-22 at 11:59 +0200, Thierry Carrez wrote: TL;DR: Let's create an Oslo projectgroup in Launchpad to track work across all Oslo libraries. In library projects, let's use milestones connected to published versions rather than the common milestones. Sounds good to me, Thierry. Thanks for the thoughtful proposal. The part about using integrated release milestones was more about highlighting that we follow a similar development model and cadence - i.e. it's helpful from a planning perspective to predict whether a given feature is likely to land in juno-1, juno-2 or juno-3. When it comes to release time, though, I'd much rather have a launchpad milestone that reflects the release itself rather than the development milestone. Sounds like we need to choose between using launchpad milestones for planning or releases, and choosing the latter makes sense to me. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] usage patterns for oslo.config
On Mon, 2014-08-11 at 15:06 -0400, Doug Hellmann wrote: On Aug 8, 2014, at 7:22 PM, Devananda van der Veen devananda@gmail.com wrote: On Fri, Aug 8, 2014 at 12:41 PM, Doug Hellmann d...@doughellmann.com wrote: That’s right. The preferred approach is to put the register_opt() in *runtime* code somewhere before the option will be used. That might be in the constructor for a class that uses an option, for example, as described in http://docs.openstack.org/developer/oslo.config/cfg.html#registering-options Doug Interesting. I've been following the prevailing example in Nova, which is to register opts at the top of a module, immediately after defining them. Is there a situation in which one approach is better than the other? The approach used in Nova is the “old” way of doing it. It works, but assumes that all of the application code is modifying a global configuration object. The runtime approach allows you to pass a configuration object to a library, which makes it easier to mock the configuration for testing and avoids having the configuration options bleed into the public API of the library. We’ve started using the runtime approach in new Oslo libraries that have configuration options, but changing the implementation in existing application code isn’t strictly necessary. I've been meaning to dig up some of the old threads and reviews to document how we got here. But briefly: * this global CONF variable originates from the gflags FLAGS variable in Nova before oslo.config * I was initially determined to get rid of any global variable use and did a lot of work to allow glance use oslo.config without a global variable * one example detail of this work - when you use paste.deploy to load an app, you have no ability to pass a config object through paste.deploy to the app. I wrote a little helper that used a thread-local variable to mimic this pass-through. * with glance done, I moved on to making keystone use oslo.config and initially didn't use the global variable. Then I ran into a veto from termie who felt very strongly that a global variable should be used. * in the end, I bought the argument that the use of a global variable was pretty deeply ingrained (especially in Nova) and that we should aim for consistent coding patterns across projects (i.e. Oslo shouldn't be just about shared code, but also shared patterns). The only realistic standard pattern we could hope for was the use of the global variable. * with that agreed, we reverted glance back to using a global variable and all projects followed suit * the case of libraries is different IMO - we'd be foolish to design APIs which lock us into using the global object So ... I wouldn't quite agree that this is the new way vs the old way, but I think it would be reasonable to re-open the discussion about using the global object in our applications. Perhaps, at least, we could reduce our dependence on it. Oh look, we have a FAQ on this: https://wiki.openstack.org/wiki/Oslo#Why_does_oslo.config_have_a_CONF_object.3F_Global_object_SUCK.21 Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] [ptls] The Czar system, or how to scale PTLs
On Fri, 2014-08-22 at 11:01 -0400, Zane Bitter wrote: I don't see that as something the wider OpenStack community needs to dictate. We have a heavyweight election process for PTLs once every cycle because that used to be the process for electing the TC. Now that it no longer serves this dual purpose, PTL elections have outlived their usefulness. If projects want to have a designated tech lead, let them. If they want to have the lead elected in a form of representative democracy, let them. But there's no need to impose that process on every project. If they want to rotate the tech lead every week instead of every 6 months, why not let them? We'll soon see from experimentation which models work. Let a thousand flowers bloom, c. I like the idea of projects being free to experiment with their governance rather than the TC mandating detailed governance models from above. But I also like the way Thierry is taking a trend we're seeing work out well across multiple projects, and generalizing it. If individual projects are to adopt explicit PTL duty delegation, then all the better if those projects adopt it in similar ways. i.e. this should turn out to be an optional best practice model that projects can choose to adopt, in much the way the *-specs repo idea took hold. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Mon, 2014-08-18 at 14:23 +0200, Thierry Carrez wrote: Clint Byrum wrote: Here's why folk are questioning Ceilometer: Nova is a set of tools to abstract virtualization implementations. Neutron is a set of tools to abstract SDN/NFV implementations. Cinder is a set of tools to abstract block-device implementations. Trove is a set of tools to simplify consumption of existing databases. Sahara is a set of tools to simplify Hadoop consumption. Swift is a feature-complete implementation of object storage, none of which existed when it was started. Keystone supports all of the above, unifying their auth. Horizon supports all of the above, unifying their GUI. Ceilometer is a complete implementation of data collection and alerting. There is no shortage of implementations that exist already. I'm also core on two projects that are getting some push back these days: Heat is a complete implementation of orchestration. There are at least a few of these already in existence, though not as many as their are data collection and alerting systems. TripleO is an attempt to deploy OpenStack using tools that OpenStack provides. There are already quite a few other tools that _can_ deploy OpenStack, so it stands to reason that people will question why we don't just use those. It is my hope we'll push more into the unifying the implementations space and withdraw a bit from the implementing stuff space. So, you see, people are happy to unify around a single abstraction, but not so much around a brand new implementation of things that already exist. Right, most projects focus on providing abstraction above implementations, and that abstraction is where the real domain expertise of OpenStack should be (because no one else is going to do it for us). Every time we reinvent something, we are at larger risk because we are out of our common specialty, and we just may not be as good as the domain specialists. That doesn't mean we should never reinvent something, but we need to be damn sure it's a good idea before we do. It's sometimes less fun to piggyback on existing implementations, but if they exist that's probably what we should do. It's certainly a valid angle to evaluate projects on, but it's also easy to be overly reductive about it - e.g. that rather than re-implement virtualization management, Nova should just be a thin abstraction over vSphere, XenServer and oVirt. To take that example, I don't think we as a project should be afraid of having such discussions but it wouldn't be productive to frame that conversation as the sky is falling, Nova re-implements the wheel, we should de-integrate it. While Ceilometer is far from alone in that space, what sets it apart is that even after it was blessed by the TC as the one we should all converge on, we keep on seeing competing implementations for some (if not all) of its scope. Convergence did not happen, and without convergence we struggle in adoption. We need to understand why, and if this is fixable. Convergence did not happen is a little unfair. It's certainly a busy space, and things like Monasca and InfluxDB are new developments. I'm impressed at how hard the Ceilometer team works to embrace such developments and patiently talks through possibilities for convergence. This attitude is something we should be applauding in an integrated project. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Retrospective veto revert policy
On Tue, 2014-08-12 at 15:56 +0100, Mark McLoughlin wrote: Hey (Terrible name for a policy, I know) From the version_cap saga here: https://review.openstack.org/110754 I think we need a better understanding of how to approach situations like this. Here's my attempt at documenting what I think we're expecting the procedure to be: https://etherpad.openstack.org/p/nova-retrospective-veto-revert-policy If it sounds reasonably sane, I can propose its addition to the Development policies doc. (In the spirit of we really need to step back and laugh at ourselves sometimes ... ) Two years ago, we were worried about patches getting merged in less than 2 hours and had a discussion about imposing a minimum review time. How times have changed! Is it even possible to land a patch in less than two hours now? :) Looking back over the thread, this part stopped me in my tracks: https://lists.launchpad.net/openstack/msg08625.html On Tue, Mar 13, 2012, Mark McLoughlin markmc@xx wrote: Sometimes there can be a few folks working through an issue together and the patch gets pushed and approved so quickly that no-one else gets a chance to review. Everyone has an opportunity to review even after a patch gets merged. JE It's not quite perfect, but if you squint you could conclude that Johannes and I have both completely reversed our opinions in the intervening two years :) The lesson I take from that is to not get too caught up in the current moment. We're growing and evolving rapidly. If we assume everyone is acting in good faith, and allow each other to debate earnestly without feelings getting hurt ... we should be able to work through anything. Now, back on topic - digging through that thread, it doesn't seem we settled on the idea of we can just revert it later if someone has an objection in this thread. Does anyone recall when that idea first came up? Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Retrospective veto revert policy
On Tue, 2014-08-12 at 15:56 +0100, Mark McLoughlin wrote: Hey (Terrible name for a policy, I know) From the version_cap saga here: https://review.openstack.org/110754 I think we need a better understanding of how to approach situations like this. Here's my attempt at documenting what I think we're expecting the procedure to be: https://etherpad.openstack.org/p/nova-retrospective-veto-revert-policy If it sounds reasonably sane, I can propose its addition to the Development policies doc. Proposed here: https://review.openstack.org/114188 Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Tue, 2014-08-05 at 18:03 +0200, Thierry Carrez wrote: Hi everyone, With the incredible growth of OpenStack, our development community is facing complex challenges. How we handle those might determine the ultimate success or failure of OpenStack. With this cycle we hit new limits in our processes, tools and cultural setup. This resulted in new limiting factors on our overall velocity, which is frustrating for developers. This resulted in the burnout of key firefighting resources. This resulted in tension between people who try to get specific work done and people who try to keep a handle on the big picture. Always fun catching up on threads like this after being away ... :) I think the thread has revolved around three distinct areas: 1) The per-project review backlog, its implications for per-project velocity, and ideas for new workflows or tooling 2) Cross-project scaling issues that get worse as we add more integrated projects 3) The factors that go into deciding whether a project belongs in the integrated release - including the appropriateness of its scope, the soundness of its architecture and how production ready it is. The first is important - hugely important - but I don't think it has any bearing on the makeup, scope or contents of the integrated release, but certainly will have a huge bearing on the success of the release and the project more generally. The third strikes me as a part of the natural evolution around how we think about the integrated release. I don't think there's any particular crisis or massive urgency here. As the TC considers proposals to integrate (or de-integrate) projects, we'll continue to work through this. These debates are contentious enough that we should avoid adding unnecessary drama to them by conflating the issues with more pressing, urgent issues. I think the second area is where we should focus. We're concerned that we're hitting a breaking point with some cross-project issues - like release management, the gate, a high level of non-deterministic test failures, insufficient cross-project collaboration on technical debt (e.g. via Oslo), difficulty in reaching consensus on new cross-project initiatives (Sean gave the examples of Group Based Policy and Rally) - such that drastic measures are required. Like maybe we should not accept any new integrated projects in this cycle while we work through those issues. Digging deeper into that means itemizing these cross-project scaling issues, figuring out which of them need drastic intervention, discussing what the intervention might be and the realistic overall effects of those interventions. AFAICT, the closest we've come in the thread to that level of detail is Sean's email here: http://lists.openstack.org/pipermail/openstack-dev/2014-August/042277.html Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Thu, 2014-08-07 at 09:30 -0400, Sean Dague wrote: While I definitely think re-balancing our quality responsibilities back into the projects will provide an overall better release, I think it's going to take a long time before it lightens our load to the point where we get more breathing room again. I'd love to hear more about this re-balancing idea. It sounds like we have some concrete ideas here and we're saying they're not relevant to this thread because they won't be an immediate solution? This isn't just QA issues, it's a coordination issue on overall consistency across projects. Something that worked fine at 5 integrated projects, got strained at 9, and I think is completely untenable at 15. I can certainly relate to that from experience with Oslo. But if you take a concrete example - as more new projects emerge, it became harder to get them all using oslo.messaging and using it consistent ways. That's become a lot better with Doug's idea of Oslo project delegates. But if we had not added those projects to the release, the only reason that the problem would be more manageable is that the use of oslo.messaging would effectively become a requirement for integration. So, projects requesting integration have to take cross-project responsibilities more seriously for fear their application would be denied. That's a very sad conclusion. Our only tool for encouraging people to take this cross-project issue is being accepted into the release and, once achieved, the cross-project responsibilities aren't taken so seriously? I don't think it's so bleak as that - given the proper support, direction and tracking I think we're seeing in Oslo how projects will play their part in getting to cross-project consistency. I think one of the big issues with a large number of projects is that implications of implementation of one project impact others, but people don't always realize. Locally correct decisions for each project may not be globally correct for OpenStack. The GBP discussion, the Rally discussion, all are flavors of this. I think we need two things here - good examples of how these cross-project initiatives can succeed so people can learn from them, and for the initiatives themselves to be patiently lead by those whose goal is a cross-project solution. It's hard work, absolutely no doubt. The point again, though, is that it is possible to do this type of work in such a way that once a small number of projects adopt the approach, most of the others will follow quite naturally. If I was trying to get a consistent cross-project approach in a particular area, the least of my concerns would be whether Ironic, Marconi, Barbican or Designate would be willing to fall in line behind a cross-project consensus. People are frustrated in infra load, for instance. It's probably worth noting that the 'config' repo currently has more commits landed than any other project in OpenStack besides 'nova' in this release. It has 30% the core team size as Nova (http://stackalytics.com/?metric=commits). Yes, infra is an extremely busy project. I'm not sure I'd compare infra/config commits to Nova commits in order to illustrate that, though. Infra is a massive endeavor, it's as critical a part of the project as any project in the integrated release, and like other strategic efforts struggles to attract contributors from as diverse a number of companies as the integrated projects. So I do think we need to really think about what *must* be in OpenStack for it to be successful, and ensure that story is well thought out, and that the pieces which provide those features in OpenStack are clearly best of breed, so they are deployed in all OpenStack deployments, and can be counted on by users of OpenStack. I do think we try hard to think this through, but no doubt we need to do better. Is this conversation concrete enough to really move our thinking along sufficiently, though? Because if every version of OpenStack deploys with a different Auth API (an example that's current but going away), we can't grow an ecosystem of tools around it. There's a nice concrete example, but it's going away? What's the best current example to talk through? This is organic definition of OpenStack through feedback with operators and developers on what's minimum needed and currently working well enough that people are happy to maintain it. And make that solid. Having a TC that is independently selected separate from the PTLs allows that group to try to make some holistic calls here. At the end of the day, that's probably going to mean saying No to more things. Everytime I turn around everyone wants the TC to say No to things, just not to their particular thing. :) Which is human nature. But I think if we don't start saying No to more things we're going to end up with a pile of mud that no one is happy with. That we're being so abstract about all of this is frustrating. I get that no-one wants to start a
Re: [openstack-dev] [all] The future of the integrated release
On Fri, 2014-08-08 at 15:36 -0700, Devananda van der Veen wrote: On Tue, Aug 5, 2014 at 10:02 AM, Monty Taylor mord...@inaugust.com wrote: Yes. Additionally, and I think we've been getting better at this in the 2 cycles that we've had an all-elected TC, I think we need to learn how to say no on technical merit - and we need to learn how to say thank you for your effort, but this isn't working out Breaking up with someone is hard to do, but sometimes it's best for everyone involved. I agree. The challenge is scaling the technical assessment of projects. We're all busy, and digging deeply enough into a new project to make an accurate assessment of it is time consuming. Some times, there are impartial subject-matter experts who can spot problems very quickly, but how do we actually gauge fitness? Yes, it's important the TC does this and it's obvious we need to get a lot better at it. The Marconi architecture threads are an example of us trying harder (and kudos to you for taking the time), but it's a little disappointing how it has turned out. On the one hand there's what seems like a this doesn't make any sense gut feeling and on the other hand an earnest, but hardly bite-sized justification for how the API was chosen and how it lead to the architecture. Frustrating that appears to not be resulting in either improved shared understanding, or improved architecture. Yet everyone is trying really hard. Letting the industry field-test a project and feed their experience back into the community is a slow process, but that is the best measure of a project's success. I seem to recall this being an implicit expectation a few years ago, but haven't seen it discussed in a while. I think I recall us discussing a must have feedback that it's successfully deployed requirement in the last cycle, but we recognized that deployers often wait until a project is integrated. I'm not suggesting we make a policy of it, but if, after a few cycles, a project is still not meeting the needs of users, I think that's a very good reason to free up the hold on that role within the stack so other projects can try and fill it (assuming that is even a role we would want filled). I'm certainly not against discussing de-integration proposals. But I could imagine a case for de-integrating every single one of our integrated projects. None of our software is perfect. How do we make sure we approach this sanely, rather than run the risk of someone starting a witch hunt because of a particular pet peeve? I could imagine a really useful dashboard showing the current state of projects along a bunch of different lines - summary of latest deployments data from the user survey, links to known scalability issues, limitations that operators should take into account, some capturing of trends so we know whether things are improving. All of this data would be useful to the TC, but also hugely useful to operators. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Tue, 2014-08-12 at 14:26 -0400, Eoghan Glynn wrote: It seems like this is exactly what the slots give us, though. The core review team picks a number of slots indicating how much work they think they can actually do (less than the available number of blueprints), and then blueprints queue up to get a slot based on priorities and turnaround time and other criteria that try to make slot allocation fair. By having the slots, not only is the review priority communicated to the review team, it is also communicated to anyone watching the project. One thing I'm not seeing shine through in this discussion of slots is whether any notion of individual cores, or small subsets of the core team with aligned interests, can champion blueprints that they have a particular interest in. For example it might address some pain-point they've encountered, or impact on some functional area that they themselves have worked on in the past, or line up with their thinking on some architectural point. But for whatever motivation, such small groups of cores currently have the freedom to self-organize in a fairly emergent way and champion individual BPs that are important to them, simply by *independently* giving those BPs review attention. Whereas under the slots initiative, presumably this power would be subsumed by the group will, as expressed by the prioritization applied to the holding pattern feeding the runways? I'm not saying this is good or bad, just pointing out a change that we should have our eyes open to. Yeah, I'm really nervous about that aspect. Say a contributor proposes a new feature, a couple of core reviewers think it's important exciting enough for them to champion it but somehow the 'group will' is that it's not a high enough priority for this release, even if everyone agrees that it is actually cool and useful. What does imposing that 'group will' on the two core reviewers and contributor achieve? That the contributor and reviewers will happily turn their attention to some of the higher priority work? Or we lose a contributor and two reviewers because they feel disenfranchised? Probably somewhere in the middle. On the other hand, what happens if work proceeds ahead even if not deemed a high priority? I don't think we can say that the contributor and two core reviewers were distracted from higher priority work, because blocking this work is probably unlikely to shift their focus in a productive way. Perhaps other reviewers are distracted because they feel the work needs more oversight than just the two core reviewers? It places more of a burden on the gate? I dunno ... the consequences of imposing group will worry me more than the consequences of allowing small groups to self-organize like this. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Tue, 2014-08-12 at 14:12 -0700, Joe Gordon wrote: Here is the full nova proposal on Blueprint in Kilo: Runways and Project Priorities https://review.openstack.org/#/c/112733/ http://docs-draft.openstack.org/33/112733/4/check/gate-nova-docs/5f38603/doc/build/html/devref/runways.html Thanks again for doing this. Four points in the discussion jump out at me. Let's see if I can paraphrase without misrepresenting :) - ttx - we need tools to be able to visualize these runways - danpb - the real problem here is that we don't have good tools to help reviewers maintain a todo list which feeds, in part, off blueprint prioritization - eglynn - what are the implications for our current ability for groups within the project to self-organize? - russellb - why is different from reviewers sponsoring blueprints, how will it work better? I've been struggling to articulate a tooling idea for a while now. Let me try again based on the runways idea and the thoughts above ... When a reviewer sits down to do some reviews, their goal should be to work through the small number of runways they're signed up to and drive the list of reviews that need their attention to zero. Reviewers should be able to create their own runways and allow others sign up to them. The reviewers responsible for that runway are responsible for pulling new reviews from explicitly defined feeder runways. Some feeder runways could be automated; no more than a search query for say new libvirt patches which aren't already in the libvirt driver runway. All of this activity should be visible to everyone. It should be possible to look at all the runways, see what runways a patch is in, understand the flow between runways, etc. There's a lot of detail that would have to be worked out, but I'm pretty convinced there's an opportunity to carve up the review backlog, empower people to help out with managing the backlog, give reviewers manageable queues for them to stay on top of, help ensure that project priorization is one of the drivers of reviewer activity and increases contributor visibility into how decisions are made. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] so what do i do about libvirt-python if i'm on precise?
On Wed, 2014-08-13 at 10:26 +0100, Daniel P. Berrange wrote: On Tue, Aug 12, 2014 at 10:09:52PM +0100, Mark McLoughlin wrote: On Wed, 2014-07-30 at 15:34 -0700, Clark Boylan wrote: On Wed, Jul 30, 2014, at 03:23 PM, Jeremy Stanley wrote: On 2014-07-30 13:21:10 -0700 (-0700), Joe Gordon wrote: While forcing people to move to a newer version of libvirt is doable on most environments, do we want to do that now? What is the benefit of doing so? [...] The only dog I have in this fight is that using the split-out libvirt-python on PyPI means we finally get to run Nova unit tests in virtualenvs which aren't built with system-site-packages enabled. It's been a long-running headache which I'd like to see eradicated everywhere we can. I understand though if we have to go about it more slowly, I'm just excited to see it finally within our grasp. -- Jeremy Stanley We aren't quite forcing people to move to newer versions. Only those installing nova test-requirements need newer libvirt. Yeah, I'm a bit confused about the problem here. Is it that people want to satisfy test-requirements through packages rather than using a virtualenv? (i.e. if people just use virtualenvs for unit tests, there's no problem right?) If so, is it possible/easy to create new, alternate packages of the libvirt python bindings (from PyPI) on their own separately from the libvirt.so and libvirtd packages? The libvirt python API is (mostly) automatically generated from a description of the XML that is built from the C source files. In tree with have fakelibvirt which is a semi-crappy attempt to provide a pure python libvirt client API with the same signature. IIUC, what you are saying is that we should get a better fakelibvirt that is truely identical with same API coverage /signatures as real libvirt ? No, I'm saying that people are installing packaged versions of recent releases of python libraries. But they're skeptical about upgrading their libvirt packages. With the work done to enable libvirt be uploaded to PyPI, can't the two be decoupled? Can't we have packaged versions of the recent python bindings on PyPI that are independent of the base packages containing libvirt.so and libvirtd? (Or I could be completely misunderstanding the issue people are seeing) Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] stable branches failure to handle review backlog
On Tue, 2014-07-29 at 14:04 +0200, Thierry Carrez wrote: Ihar Hrachyshka a écrit : On 29/07/14 12:15, Daniel P. Berrange wrote: Looking at the current review backlog I think that we have to seriously question whether our stable branch review process in Nova is working to an acceptable level On Havana - 43 patches pending - 19 patches with a single +2 - 1 patch with a -1 - 0 patches wit a -2 - Stalest waiting 111 days since most recent patch upload - Oldest waiting 250 days since first patch upload - 26 patches waiting more than 1 month since most recent upload - 40 patches waiting more than 1 month since first upload On Icehouse: - 45 patches pending - 17 patches with a single +2 - 4 patches with a -1 - 1 patch with a -2 - Stalest waiting 84 days since most recent patch upload - Oldest waiting 88 days since first patch upload - 10 patches waiting more than 1 month since most recent upload - 29 patches waiting more than 1 month since first upload I think those stats paint a pretty poor picture of our stable branch review process, particularly Havana. It should not take us 250 days for our review team to figure out whether a patch is suitable material for a stable branch, nor should we have nearly all the patches waiting more than 1 month in Havana. These branches are not getting sufficient reviewer attention and we need to take steps to fix that. If I had to set a benchmark, assuming CI passes, I'd expect us to either approve or reject submissions for stable within a 2 week window in the common case, 1 month at the worst case. Totally agreed. A bit of history. At the dawn of time there were no OpenStack stable branches, each distribution was maintaining its own stable branches, duplicating the backporting work. I'm not sure how much backporting was going on at the time of the Essex summit. I'm sure Ubuntu had some backports, but that was probably about it? At some point it was suggested (mostly by RedHat and Canonical folks) that there should be collaboration around that task, and the OpenStack project decided to set up official stable branches where all distributions could share the backporting work. The stable team group was seeded with package maintainers from all over the distro world. During that first design summit session, it was mainly you, me and Daviey discussing. Both you and Daviey saw this primarily about distros collaborating, but I never saw it that way. I don't see how any self-respecting open-source project can throw a release over the wall and have no ability to address critical bugs with that release until the next release 6 months later which will also include a bunch of new feature work with new bugs. That's not a distro maintainer point of view. At that Essex summit, we were lamenting how many critical bugs in Nova had been discovered shortly after the Diablo release. Our inability to do a bugfix release of Nova for Diablo seemed like a huge problem to me. So these branches originally only exist as a convenient place to collaborate on backporting work. This is completely separate from development work, even if those days backports are often proposed by developers themselves. The stable branch team is separate from the rest of OpenStack teams. We have always been very clear tht if the stable branches are no longer maintained (i.e. if the distributions don't see the value of those anymore), then we'll consider removing them. We, as a project, only signed up to support those as long as the distros wanted them. You can certainly argue that the project never signed up for the responsibility. I don't see it that way, but there was certainly always a debate whether this was the project taking responsibility for bugfix releases or whether it was just downstream distros collaborating. The thing about branches going away if they're not maintained isn't anything unusual. If *any* effort within the project becomes so unmaintained due to a lack of interest such that we can't stand over it, then we should consider retiring it. We have been adding new members to the stable branch teams recently, but those tend to come from development teams rather than downstream distributions, and that starts to bend the original landscape. Basically, the stable branch needs to be very conservative to be a source of safe updates -- downstream distributions understand the need to weigh the benefit of the patch vs. the disruption it may cause. Developers have another type of incentive, which is to get the fix they worked on into stable releases, without necessarily being very conservative. Adding more -core people to the stable team to compensate the absence of distro maintainers will ultimately kill those branches. That's quite a leap to say that -core team members will be so incapable of the appropriate level of conservatism that the branch will be
Re: [openstack-dev] [all] 3rd Party CI vs. Gerrit
On Wed, 2014-08-13 at 12:05 -0700, James E. Blair wrote: cor...@inaugust.com (James E. Blair) writes: Sean Dague s...@dague.net writes: This has all gone far enough that someone actually wrote a Grease Monkey script to purge all the 3rd Party CI content out of Jenkins UI. People are writing mail filters to dump all the notifications. Dan Berange filters all them out of his gerrit query tools. I should also mention that there is a pending change to do something similar via site-local Javascript in our Gerrit: https://review.openstack.org/#/c/95743/ I don't think it's an ideal long-term solution, but if it works, we may have some immediate relief without all having to install greasemonkey scripts. You may have noticed that this has merged, along with a further change that shows the latest results in a table format. (You may need to force-reload in your browser to see the change.) Beautiful! Thank you so much to everyone involved. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] fair standards for all hypervisor drivers
On Mon, 2014-08-11 at 15:25 -0700, Joe Gordon wrote: On Sun, Aug 10, 2014 at 11:59 PM, Mark McLoughlin mar...@redhat.com wrote: On Fri, 2014-08-08 at 09:06 -0400, Russell Bryant wrote: On 08/07/2014 08:06 PM, Michael Still wrote: It seems to me that the tension here is that there are groups who would really like to use features in newer libvirts that we don't CI on in the gate. Is it naive to think that a possible solution here is to do the following: - revert the libvirt version_cap flag I don't feel strongly either way on this. It seemed useful at the time for being able to decouple upgrading libvirt and enabling features that come with that. Right, I suggested the flag as a more deliberate way of avoiding the issue that was previously seen in the gate with live snapshots. I still think it's a pretty elegant and useful little feature, and don't think we need to use it as proxy battle over testing requirements for new libvirt features. Mark, I am not sure if I follow. The gate issue with live snapshots has been worked around by turning it off [0], so presumably this patch is forward facing. I fail to see how this patch is needed to help the gate in the future. On the live snapshot issue specifically, we disabled it by requiring 1.3.0 for the feature. With the version cap set to 1.2.2, we won't automatically enable this code path again if we update to 1.3.0. No question that's a bit of a mess, though. The point was a more general one - we learned from the live snapshot issue that having a libvirt upgrade immediately enable new code paths was a bad idea. The patch is a simple, elegant way of avoiding that. Wouldn't it just delay the issues until we change the version_cap? Yes, that's the idea. Rather than having to scramble when the new devstack-gate image shows up, we'd be able to work on any issues in the context of a patch series to bump the version_cap. The issue I see with the libvirt version_cap [1] is best captured in its commit message: The end user can override the limit if they wish to opt-in to use of untested features via the 'version_cap' setting in the 'libvirt' group. This goes against the very direction nova has been moving in for some time now. We have been moving away from merging untested (re: no integration testing) features. This patch changes the very direction the project is going in over testing without so much as a discussion. While I think it may be time that we revisited this discussion, the discussion needs to happen before any patches are merged. You put it well - some apparently see us moving towards a zero-tolerance policy of not having any code which isn't functionally tested in the gate. That obviously is not the case right now. The sentiment is great, but any zero-tolerance policy is dangerous. I'm very much in favor of discussing this further. We should have some principles and goals around this, but rather than argue this in the abstract we should be open to discussing the tradeoffs involved with individual patches. I am less concerned about the contents of this patch, and more concerned with how such a big de facto change in nova policy (we accept untested code sometimes) without any discussion or consensus. In your comment on the revert [2], you say the 'whether not-CI-tested features should be allowed to be merged' debate is 'clearly unresolved.' How did you get to that conclusion? This was never brought up in the mid-cycles as a unresolved topic to be discussed. In our specs template we say Is this untestable in gate given current limitations (specific hardware / software configurations available)? If so, are there mitigation plans (3rd party testing, gate enhancements, etc) [3]. We have been blocking untested features for some time now. Asking is this tested in a spec template makes a tonne of sense. Requiring some thought to be put into mitigation where a feature is untestable in the gate makes sense. Requiring that the code is tested where possible makes sense. It's a zero-tolerance get your code functionally tested or GTFO policy that I'm concerned about. I am further perplexed by what Daniel Berrange, the patch author, meant when he commented [2] Regardless of the outcome of the testing discussion we believe this is a useful feature to have. Who is 'we'? Because I don't see how that can be nova-core or even nova-specs-core, especially considering how many members of those groups are +2 on the revert. So if 'we' is neither of those groups then who is 'we'? That's for Dan to answer, but I think you're either nitpicking or have a very serious concern. If nitpicking, Dan could just be using the Royal 'We' :) Or he could just mean
[openstack-dev] [nova] Retrospective veto revert policy
Hey (Terrible name for a policy, I know) From the version_cap saga here: https://review.openstack.org/110754 I think we need a better understanding of how to approach situations like this. Here's my attempt at documenting what I think we're expecting the procedure to be: https://etherpad.openstack.org/p/nova-retrospective-veto-revert-policy If it sounds reasonably sane, I can propose its addition to the Development policies doc. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] so what do i do about libvirt-python if i'm on precise?
On Wed, 2014-07-30 at 15:34 -0700, Clark Boylan wrote: On Wed, Jul 30, 2014, at 03:23 PM, Jeremy Stanley wrote: On 2014-07-30 13:21:10 -0700 (-0700), Joe Gordon wrote: While forcing people to move to a newer version of libvirt is doable on most environments, do we want to do that now? What is the benefit of doing so? [...] The only dog I have in this fight is that using the split-out libvirt-python on PyPI means we finally get to run Nova unit tests in virtualenvs which aren't built with system-site-packages enabled. It's been a long-running headache which I'd like to see eradicated everywhere we can. I understand though if we have to go about it more slowly, I'm just excited to see it finally within our grasp. -- Jeremy Stanley We aren't quite forcing people to move to newer versions. Only those installing nova test-requirements need newer libvirt. Yeah, I'm a bit confused about the problem here. Is it that people want to satisfy test-requirements through packages rather than using a virtualenv? (i.e. if people just use virtualenvs for unit tests, there's no problem right?) If so, is it possible/easy to create new, alternate packages of the libvirt python bindings (from PyPI) on their own separately from the libvirt.so and libvirtd packages? Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Nominating Jay Pipes for nova-core
On Wed, 2014-07-30 at 14:02 -0700, Michael Still wrote: Greetings, I would like to nominate Jay Pipes for the nova-core team. Jay has been involved with nova for a long time now. He's previously been a nova core, as well as a glance core (and PTL). He's been around so long that there are probably other types of core status I have missed. Please respond with +1s or any concerns. Was away, but +1 for the record. Would have been happy to see this some time ago. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] fair standards for all hypervisor drivers
On Fri, 2014-08-08 at 09:06 -0400, Russell Bryant wrote: On 08/07/2014 08:06 PM, Michael Still wrote: It seems to me that the tension here is that there are groups who would really like to use features in newer libvirts that we don't CI on in the gate. Is it naive to think that a possible solution here is to do the following: - revert the libvirt version_cap flag I don't feel strongly either way on this. It seemed useful at the time for being able to decouple upgrading libvirt and enabling features that come with that. Right, I suggested the flag as a more deliberate way of avoiding the issue that was previously seen in the gate with live snapshots. I still think it's a pretty elegant and useful little feature, and don't think we need to use it as proxy battle over testing requirements for new libvirt features. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] fair standards for all hypervisor drivers
On Thu, 2014-07-17 at 09:58 +0100, Daniel P. Berrange wrote: On Thu, Jul 17, 2014 at 08:46:12AM +1000, Michael Still wrote: Top posting to the original email because I want this to stand out... I've added this to the agenda for the nova mid cycle meetup, I think most of the contributors to this thread will be there. So, if we can nail this down here then that's great, but if we think we'd be more productive in person chatting about this then we have that option too. FYI, I'm afraid I won't be at the mid-cycle meetup since it clashed with my being on holiday. So I'd really prefer if we keep the discussion on this mailing list where everyone has a chance to participate. Same here. Pre-arranged vacation, otherwise I'd have been there. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] fair standards for all hypervisor drivers
On Wed, 2014-07-16 at 16:15 +0200, Sean Dague wrote: .. Based on these experiences, libvirt version differences seem to be as substantial as major hypervisor differences. There is a proposal here - https://review.openstack.org/#/c/103923/ to hold newer versions of libvirt to the same standard we hold xen, vmware, hyperv, docker, ironic, etc. That's a bit of a mis-characterization - in terms of functional test coverage, the libvirt driver is the bar that all the other drivers struggle to meet. And I doubt any of us pay too close attention to the feature coverage that the 3rd party CI test jobs have. I'm somewhat concerned that the -2 pile on in this review is a double standard of libvirt features, and features exploiting really new upstream features. I feel like a lot of the language being used here about the burden of doing this testing is exactly the same as was presented by the docker team before their driver was removed, which was ignored by the Nova team at the time. Personally, I wasn't very comfortable with the docker driver move. It certainly gave an outward impression that we're an unfriendly community. The mitigating factor was that a lot of friendly, collaborative, coaching work went on in the background for months. Expectations were communicated well in advance. Kicking the docker driver out of the tree has resulted in an uptick in the amount of work happening on it, but I suspect most people involved have a bad taste in their mouths. I guess there's incentives at play which mean they'll continue plugging away at it, but those incentives aren't always at play. It was the concern by the freebsd team, which was also ignored and they were told to go land libvirt patches instead. I'm ok with us as a project changing our mind and deciding that the test bar needs to be taken down a notch or two because it's too burdensome to contributors and vendors, but if we are doing that, we need to do it for everyone. A lot of other organizations have put a ton of time and energy into this, and are carrying a maintenance cost of running these systems to get results back in a timely basis. I don't agree that we need to apply the same rules equally to everyone. At least part of the reasoning behind the emphasis on 3rd party CI testing was that projects (Neutron in particular) were being overwhelmed by contributions to drivers from developers who never contributed in any way to the core. The corollary of that is the contributors who do contribute to the core should be given a bit more leeway in return. There's a natural building of trust and element of human relationships here. As a reviewer, you learn to trust contributors with a good track record and perhaps prioritize contributions from them. As we seem deadlocked in the review, I think the mailing list is probably a better place for this. If we want to reduce the standards for libvirt we should reconsider what's being asked of 3rd party CI teams, and things like the docker driver, as well as the A, B, C driver classification. Because clearly libvirt 1.2.5+ isn't actually class A supported. No, there are features or code paths of the libvirt 1.2.5+ driver that aren't as well tested as the class A designation implies. And we have a proposal to make sure these aren't used by default: https://review.openstack.org/107119 i.e. to stray off the class A path, an operator has to opt into it by changing a configuration option that explains they will be enabling code paths which aren't yet tested upstream. These features have value to some people now, they don't risk regressing the class A driver and there's a clear path to them being elevated to class A in time. We should value these contributions and nurture these contributors. Appending some of my comments from the review below. The tl;dr is that I think we're losing sight of the importance of welcoming and nurturing contributors, and valuing whatever contributions they can make. That terrifies me. Mark. --- Compared to other open source projects, we have done an awesome job in OpenStack of having good functional test coverage. Arguably, given the complexity of the system, we couldn't have got this far without it. I can take zero credit for any of it. However, not everything is tested now, nor is the tests we have foolproof. When you consider the number of configuration options we have, the supported distros, the ranges of library versions we claim to support, etc., etc. I don't think we can ever get to an everything is tested point. In the absence of that, I think we should aim to be more clear what *is* tested. The config option I suggest does that, which is a big part of its merit IMHO. We've had some success with the be nasty enough to driver contributors and they'll do what we want approach so far, but IMHO that was an exceptional approach for an exceptional situation - drivers that were completely broken, and driver developers who didn't contribute to the core
Re: [openstack-dev] [all] Treating notifications as a contract
On Fri, 2014-07-11 at 10:04 +0100, Chris Dent wrote: On Fri, 11 Jul 2014, Lucas Alvares Gomes wrote: The data format that Ironic will send was part of the spec proposed and could have been reviewed. I think there's still time to change it tho, if you have a better format talk to Haomeng which is the guys responsible for that work in Ironic and see if he can change it (We can put up a following patch to fix the spec with the new format as well) . But we need to do this ASAP because we want to get it landed in Ironic soon. It was only after doing the work that I realized how it might be an example for the sake of this discussion. As the architecure of Ceilometer currently exist there still needs to be some measure of custom code, even if the notifications are as I described them. However, if we want to take this opportunity to move some of the smarts from Ceilomer into the Ironic code then the paste that I created might be a guide to make it possible: http://paste.openstack.org/show/86071/ So you're proposing that all payloads should contain something like: 'events': [ # one or more dicts with something like { # some kind of identifier for the type of event 'class': 'hardware.ipmi.temperature', 'type': '#thing that indicates threshold, discrete, cumulative', 'id': 'DIMM GH VR Temp (0x3b)', 'value': '26', 'unit': 'C', 'extra': { ... } } i.e. a class, type, id, value, unit and a space to put additional metadata. On the subject of notifications as a contract, calling the additional metadata field 'extra' suggests to me that there are no stability promises being made about those fields. Was that intentional? However on that however, if there's some chance that a large change could happen, it might be better to wait, I don't know. Unlikely that a larger change will be made in Juno - take small window of opportunity to rationalize Ironic's payload IMHO. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On Thu, 2014-07-10 at 16:21 -0400, Eoghan Glynn wrote: One of the issues that has been raised in the recent discussions with the QA team about branchless Tempest relates to some legacy defects in the OpenStack notification system. Got links to specifics? I thought the consensus was that there was a contract here which we need to maintain, so I'd be curious where that broke down. Well I could go digging in the LP fossil-record for specific bugs, but it's late, so for now I'll simply appeal to anecdata and tribal memory of ceilometer being broken by notification changes on the nova side. Versioning and ability to newer contract versions would be good too, but in the absence of such things we should maintain backwards compat. Yes, I think that was the aspiration, but not always backed up by practice in reality. The reason I ask about specifics is to figure out which is more important - versioned payloads, or automated testing of payload format. i.e. have we been accidentally or purposefully changing the format? If the latter, would the change have warranted a new incompatible version of the payload format? Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] REST API access to configuration options
On Tue, 2014-07-15 at 08:54 +0100, Henry Nash wrote: HI As the number of configuration options increases and OpenStack installations become more complex, the chances of incorrect configuration increases. There is no better way of enabling cloud providers to be able to check the configuration state of an OpenStack service than providing a direct REST API that allows the current running values to be inspected. Having an API to provide this information becomes increasingly important for dev/ops style operation. As part of Keystone we are considering adding such an ability (see: https://review.openstack.org/#/c/106558/). However, since this is the sort of thing that might be relevant to and/or affect other projects, I wanted to get views from the wider dev audience. Any such change obviously has to take security in mind - and as the spec says, just like when we log config options, any options marked as secret will be obfuscated. In addition, the API will be protected by the normal policy mechanism and is likely in most installations to be left as admin required. And of course, since it is an extension, if a particular installation does not want to use it, they don't need to load it. Do people think this is a good idea? Useful in other projects? Concerned about the risks? I would have thought operators would be comfortable gleaning this information from the log files? Also, this is going to tell you how the API service you connected to was configured. Where there are multiple API servers, what about the others? How do operators verify all of the API servers behind a load balancer with this? And in the case of something like Nova, what about the many other nodes behind the API server? Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] REST API access to configuration options
On Tue, 2014-07-15 at 13:00 +0100, Henry Nash wrote: Mark, Thanks for your comments (as well as remarks on the WIP code-review). So clearly gathering and analysing log files is an alternative approach, perhaps not as immediate as an API call. In general, I believe that the more capability we provide via easy-to-consume APIs (with appropriate permissions) the more effective (and innovative) ways of management of OpenStack we will achieve (easier to build automated management systems). I'm skeptical - like Joe says, this is a general problem and management tooling will have generic ways of tackling this without using a REST API. In terms of multi API servers, obviously each server would respond to the API with the values it has set, so operators could check any or all of the serversand this actually becomes more important as people distribute config files around to the various servers (since more chance of something getting out of sync). The fact that it only deals with API servers, and that you need to bypass the load balancer in order to iterate over all API servers, makes this of very limited use IMHO. Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
On Thu, 2014-07-03 at 16:27 +0100, Mark McLoughlin wrote: Hey This is an attempt to summarize a really useful discussion that Victor, Flavio and I have been having today. At the bottom are some background links - basically what I have open in my browser right now thinking through all of this. We're attempting to take baby-steps towards moving completely from eventlet to asyncio/trollius. The thinking is for Ceilometer to be the first victim. I got a little behind on this thread, but maybe it'd be helpful to summarize some things from this good discussion: - Where/when was this decided?!? Victor is working on prototyping how an OpenStack service would move to using asyncio. Whether a move to asyncio across the board makes sense - and what exactly it would look like - hasn't been *decided*. The idea is merely being explored at this point. - Is moving to asyncio really a priority compared to other things? I think Victor has made a good case on what's wrong with eventlet?[1] and, personally, I'm excited about the prospect of the Python community more generally converging on asyncio. Understanding what OpenStack would need in order move to asyncio will help the asyncio effort more generally. Figuring through some of this stuff is a priority for Victor and others, but no-one is saying it's an immediate priority for the whole project. - Moving from an implicitly async to an explicitly async programming has enormous implications and we need to figure out what it means for libraries like SQLalchemy and abstraction layers like ORMs. I think that's well understood - the topic of this thread is merely how to make a small addition to oslo.messaging (the ability to dispatch asyncio co-routines on eventlet) so that we can move on to figuring out the next piece of puzzle. - Some people are clearly skeptical about whether asyncio is the right thing for Python generally, whether it's the right thing for OpenStack, whatever. Personally, I'm optimistic but I don't find the conversation all that interesting right now - I want to see how the prototype efforts work out before making a call about whether it's feasible and useful. - Taskflow vs asyncio - good discussion, plenty to figure out. They're mostly orthogonal concerns IMHO but *maybe* we decide adopting both makes sense and that both should be adopted together. I'd like to see more concrete examples showing taskflow vs asyncio vs taskflow/asyncio to understand better. So, tl;dr is that lots of work remains to even begin to understand how exactly asyncio could be adopted and whether that makes sense. The thread raises some interesting viewpoints, but I don't think it moves our understanding along all that much. The initial mail was simply about unlocking one very small piece of the puzzle. Mark. [1] - http://techs.enovance.com/6562/asyncio-openstack-python3 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
On Mon, 2014-07-07 at 12:48 +0200, Nikola Đipanov wrote: When I read all of this stuff and got my head around it (took some time :) ), a glaring drawback of such an approach, and as I mentioned on the spec proposing it [1] is that we would not really doing asyncio, we would just be pretending we are by using a subset of it's APIs, and having all of the really important stuff for overall design of the code (code that needs to do IO in the callbacks for example) and ultimately - performance, completely unavailable to us when porting. So in Mark's example above: @asyncio.coroutine def foo(self): result = yield from some_async_op(...) return do_stuff(result) A developer would not need to do anything that asyncio requires like make sure that some_async_op() registers a callback with the eventloop (using for example event_loop.add_reader/writer methods) you could just simply make it use a 'greened' call and things would continue working happily. Yes, Victor and I noticed this problem and wondered whether there was a way to e.g. turn-off the monkey-patching at runtime in a single greenthread, or even just make any attempt to context switch raise an exception. i.e. a way to run foo() coroutine above in a greenthread such that context switching is disallowed, or logged, or whatever while the function is running. The only way context switching would be allowed to happen would be if the coroutine yielded. I have a feeling this will in turn have a lot of people writing code that they don't understand, and as library writers - we are not doing an excellent job at that point. Now porting an OpenStack project to another IO library with completely different design is a huge job and there is unlikely a single 'right' way to do it, so treat this as a discussion starter, that will hopefully give us a better understanding of the problem we are trying to tackle. So I hacked up together a small POC of a different approach. In short - we actually use a real asyncio selector eventloop in a separate thread, and dispatch stuff to it when we figure out that our callback is in fact a coroutine. More will be clear form the code so: (Warning - hacky code ahead): [2] I will probably be updating it - but if you just clone the repo, all the history is there. I wrote it without the oslo.messaging abstractions like listener and dispatcher, but it is relatively easy to see which bits of code would go in those. Several things worth noting as you read the above. First one is that we do not monkeypatch until we have fired of the asyncio thread (Victor correctly noticed this would be a problem in a comment on [1]). This may seem hacky (and it is) but if decide to go further down this road - we would probably not be 'greening the world' but rather importing patched non-ported modules when we need to dispatch to them. This may sound like a big deal, and it is, but it is critical to actually running ported code in a real asyncio evenloop. I have not yet tested this further, but from briefly reading eventlet code - it seems like ti should work. Another interesting problem is (as I have briefly mentioned in [1]) - what happens when we need to synchronize between eventlet-run and asyncio-run callbacks while we are in the process of porting. I don't have a good answer to that yet, but it is worth noting that the proposed approach doesn't either, and this is a thing we should have some idea about before going in with a knife. Now for some marketing :) - I can see several advantages of such an approach, the obvious one being as stated, that we are in fact doing asyncio, so we are all in. Also as you can see [2] the implementation is far from magical - it's (surprisingly?) simple, and requires no other additional dependencies apart from trollius itself (granted greenio is not too complex either). I am sure that we would hit some other problems that were not clear from this basic POC (it was done in ~3 hours on a bus), but it seems to me that those problems will likely need to be solved anyhow if we are to port Ceilometer (or any other project) to asyncio, we will just hit them sooner this way. It was a fun approach to ponder anyway - so I am looking forward to comments and thoughts. It's an interesting idea and I'd certainly welcome a more detailed analysis of what the approach would mean for a service like Ceilometer. My instinct is that adding an additional native thread where there is only one native thread now will lead to tricky concurrency issues and a more significant change of behavior than with the greenio approach. The reason I like the greenio idea is that it allows us to make the programming model changes without very significantly changing what happens at runtime - the behavior, order of execution, concurrency concerns, etc. shouldn't be all that different. Mark. ___ OpenStack-dev mailing list
Re: [openstack-dev] [all] oslo.messaging 1.4.0.0a3 released
On Wed, 2014-07-09 at 03:53 +, Paul Michali (pcm) wrote: Mark, What is the status of adding the newer oslo.messaging releases to global requirements? I had tried to get 1.4.0.0a2 added to requirements (https://review.openstack.org/#/c/103536/), but it was failing Jenkins. Wondering how we get that version (or newer) into global requirements (some issue with pre-releases?). Yeah, I don't know what the latest status is with this beyond bandersnatch: http://lists.openstack.org/pipermail/openstack-dev/2014-July/039089.html https://review.openstack.org/#/q/project:openstack-infra/config+topic:bandersnatch,n,z https://review.openstack.org/103256 Mark. Thanks, PCM (Paul Michali) MAIL …..…. p...@cisco.com IRC ……..… pcm_ (irc.freenode.com) TW ………... @pmichali GPG Key … 4525ECC253E31A83 Fingerprint .. 307A 96BB 1A4C D2C7 931D 8D2D 4525 ECC2 53E3 1A83 On Jul 8, 2014, at 4:58 PM, Mark McLoughlin mar...@redhat.com wrote: The Oslo team is pleased to announce the release of oslo.messaging 1.4.0.0a3, another pre-release in the 1.4.0 series for oslo.messaging during the Juno cycle: https://pypi.python.org/pypi/oslo.messaging/1.4.0.0a3 oslo.messaging provides an API which supports RPC and notifications over a number of different messaging transports. Full details of the 1.4.0.0a3 release is available here: http://docs.openstack.org/developer/oslo.messaging/#a3 Please report problems using the oslo.messaging bug tracker: https://bugs.launchpad.net/oslo.messaging Thanks to all those who contributed to the release! Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On Thu, 2014-07-10 at 04:48 -0400, Eoghan Glynn wrote: TL;DR: do we need to stabilize notifications behind a versioned and discoverable contract? Folks, One of the issues that has been raised in the recent discussions with the QA team about branchless Tempest relates to some legacy defects in the OpenStack notification system. Got links to specifics? I thought the consensus was that there was a contract here which we need to maintain, so I'd be curious where that broke down. Versioning and ability to newer contract versions would be good too, but in the absence of such things we should maintain backwards compat. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [all] oslo.config 1.4.0.0a2 released
The Oslo team is pleased to announce the release of oslo.config 1.4.0.0a2, another pre-release in the 1.4.0 series for oslo.config during the Juno cycle: https://pypi.python.org/pypi/oslo.config/1.4.0.0a2 oslo.config provides an API which supports parsing command line arguments and .ini style configuration files. Full details of the 1.4.0.0a2 release is available here: http://docs.openstack.org/developer/oslo.config/#a2 Please report problems using the oslo bug tracker: https://bugs.launchpad.net/oslo Thanks to all those who contributed to the release! Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Policy around Requirements Adds (was: New class of requirements for Stackforge projects)
On Mon, 2014-07-07 at 16:46 -0400, Sean Dague wrote: This thread was unfortunately hidden under a project specific tag (I have thus stripped all the tags). The crux of the argument here is the following: Is a stackforge project project able to propose additions to global-requirements.txt that aren't used by any projects in OpenStack. I believe the answer is firmly *no*. global-requirements.txt provides a way for us to have a single point of vetting for requirements for OpenStack. It lets us assess licensing, maturity, current state of packaging, python3 support, all in one place. And it lets us enforce that integration of OpenStack projects all run under a well understood set of requirements. Allowing Stackforge projects use this as their base set of dependencies, while still taking additional dependencies makes sense to me. I don't really understand this GTFO stance. Solum wants to depend on mistralclient - that seems like a perfectly reasonable thing to want to do. And they also appear to not want to stray any further from the base set of dependencies shared by OpenStack projects - that also seems like a good thing. Now, perhaps the mechanics are tricky, and perhaps we don't want to enable Stackforge projects do stuff like pin to a different version of SQLalchemy, and perhaps this proposal isn't the ideal solution, and perhaps infra/others don't want to spend a lot of energy on something specifically for Stackforge projects ... but I don't see something fundamentally wrong with what they want to do. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Policy around Requirements Adds
On Tue, 2014-07-08 at 06:26 -0400, Sean Dague wrote: On 07/08/2014 04:33 AM, Mark McLoughlin wrote: On Mon, 2014-07-07 at 16:46 -0400, Sean Dague wrote: This thread was unfortunately hidden under a project specific tag (I have thus stripped all the tags). The crux of the argument here is the following: Is a stackforge project project able to propose additions to global-requirements.txt that aren't used by any projects in OpenStack. I believe the answer is firmly *no*. global-requirements.txt provides a way for us to have a single point of vetting for requirements for OpenStack. It lets us assess licensing, maturity, current state of packaging, python3 support, all in one place. And it lets us enforce that integration of OpenStack projects all run under a well understood set of requirements. Allowing Stackforge projects use this as their base set of dependencies, while still taking additional dependencies makes sense to me. I don't really understand this GTFO stance. Solum wants to depend on mistralclient - that seems like a perfectly reasonable thing to want to do. And they also appear to not want to stray any further from the base set of dependencies shared by OpenStack projects - that also seems like a good thing. Now, perhaps the mechanics are tricky, and perhaps we don't want to enable Stackforge projects do stuff like pin to a different version of SQLalchemy, and perhaps this proposal isn't the ideal solution, and perhaps infra/others don't want to spend a lot of energy on something specifically for Stackforge projects ... but I don't see something fundamentally wrong with what they want to do. Once it's in global requirements, any OpenStack project can include it in their requirements. Modifying that file for only stackforge projects is what I'm against. If the solum team would like to write up a partial sync mechanism, that's fine. It just needs to not be impacting the enforcement mechanism we actually need for OpenStack projects. Totally agree. Solum taking a dependency on mistralclient shouldn't e.g. allow glance to take a dependency on mistralclient. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [all] oslo.messaging 1.4.0.0a3 released
The Oslo team is pleased to announce the release of oslo.messaging 1.4.0.0a3, another pre-release in the 1.4.0 series for oslo.messaging during the Juno cycle: https://pypi.python.org/pypi/oslo.messaging/1.4.0.0a3 oslo.messaging provides an API which supports RPC and notifications over a number of different messaging transports. Full details of the 1.4.0.0a3 release is available here: http://docs.openstack.org/developer/oslo.messaging/#a3 Please report problems using the oslo.messaging bug tracker: https://bugs.launchpad.net/oslo.messaging Thanks to all those who contributed to the release! Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
On Sun, 2014-07-06 at 09:28 -0400, Eoghan Glynn wrote: This is an attempt to summarize a really useful discussion that Victor, Flavio and I have been having today. At the bottom are some background links - basically what I have open in my browser right now thinking through all of this. Thanks for the detailed summary, it puts a more flesh on the bones than a brief conversation on the fringes of the Paris mid-cycle. Just a few clarifications and suggestions inline to add into the mix. We're attempting to take baby-steps towards moving completely from eventlet to asyncio/trollius. The thinking is for Ceilometer to be the first victim. First beneficiary, I hope :) Ceilometer's code is run in response to various I/O events like REST API requests, RPC calls, notifications received, etc. We eventually want the asyncio event loop to be what schedules Ceilometer's code in response to these events. Right now, it is eventlet doing that. Yes. And there is one other class of stimulus, also related to eventlet, that is very important for triggering the execution of ceilometer logic. That would be the timed tasks that drive polling of: * REST APIs provided by other openstack services * the local hypervisor running on each compute node * the SNMP daemons running at host-level etc. and also trigger periodic alarm evaluation. IIUC these tasks are all mediated via the oslo threadgroup's usage of eventlet.greenpool[1]. Would this logic also be replaced as part of this effort? As part of the broader switch from eventlet to asyncio effort, yes absolutely. At the core of any event loop is code to do select() (or equivalents) waiting for file descriptors to become readable or writable, or timers to expire. We want to switch from the eventlet event loop to the asyncio event loop. The ThreadGroup abstraction from oslo-incubator is an interface to the eventlet event loop. When you do: self.tg.add_timer(interval, self._evaluate_assigned_alarms) You're saying run evaluate_assigned_alarms() every $interval seconds, using select() to sleep between executions. When you do: self.tg.add_thread(self.start_udp) you're saying run some code which will either run to completion or set wait for fd or timer events using select(). The asyncio versions of those will be: event_loop.call_later(delay, callback) event_loop.call_soon(callback) where the supplied callbacks will be asyncio 'coroutines' which rather than doing: def foo(...): buf = read(fd) and rely on eventlet's monkey patch to cause us to enter the event loop's select() when the read() blocks, we instead do: @asyncio.coroutine def foo(...): buf = yield from read(fd) which shows exactly where we might yield to the event loop. The challenge is that porting code like the foo() function above is pretty invasive and we can't simply port an entire service at once. So, we need to be able to support a service using both eventlet-reliant code and asyncio coroutines. In your example of the openstack.common.threadgroup API - we would initially need to add support for scheduling asyncio coroutine callback arguments as eventlet greenthreads in add_timer() and add_thread(), and later we would port threadgroup itself to rely completely on asyncio. Now, because we're using eventlet, the code that is run in response to these events looks like synchronous code that makes a bunch of synchronous calls. For example, the code might do some_sync_op() and that will cause a context switch to a different greenthread (within the same native thread) where we might handle another I/O event (like a REST API request) Just to make the point that most of the agents in the ceilometer zoo tend to react to just a single type of stimulus, as opposed to a mix of dispatching from both message bus and the REST API. So to classify, we'd have: * compute-agent: timer tasks for polling * central-agent: timer tasks for polling * notification-agent: dispatch of external notifications from the message bus * collector: dispatch of internal metering messages from the message bus * api-service: dispatch of REST API calls * alarm-evaluator: timer tasks for alarm evaluation * alarm-notifier: dispatch of internal alarm notifications IIRC, the only case where there's a significant mix of trigger styles is the partitioned alarm evaluator, where assignments of alarm subsets for evaluation is driven over RPC, whereas the actual thresholding is triggered by a timer. Cool, that's helpful. I think the key thing is deciding which stimulus (and hence agent) we should start with. Porting from eventlet's implicit async approach to asyncio's explicit async API will be seriously time consuming and we need to be able to do it piece-by-piece. Yes, I agree, a step-wise approach is the key here. So I'd love to have some sense of the time horizon for this effort. It clearly feels like a
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
On Mon, 2014-07-07 at 15:53 +0100, Gordon Sim wrote: On 07/07/2014 03:12 PM, Victor Stinner wrote: The first step is to patch endpoints to add @trollius.coroutine to the methods, and add yield From(...) on asynchronous tasks. What are the 'endpoints' here? Are these internal to the oslo.messaging library, or external to it? The callback functions we dispatch to are called 'endpoint methods' - e.g. they are methods on the 'endpoints' objects passed to get_rpc_server(). Later we may modify Oslo Messaging to be able to call an RPC method asynchronously, a method which would return a Trollius coroutine or task directly. The problem is that Oslo Messaging currently hides implementation details like eventlet. I guess my question is how effectively does it hide it? If the answer to the above is that this change can be contained within the oslo.messaging implementation itself, then that would suggest its hidden reasonably well. If, as I first understood (perhaps wrongly) it required changes to every use of the oslo.messaging API, then it wouldn't really be hidden. Returning a Trollius object means that Oslo Messaging will use explicitly Trollius. I'm not sure that OpenStack is ready for that today. The oslo.messaging API could evolve/expand to include explicitly asynchronous methods that did not directly expose Trollius. I'd expect us to add e.g. @asyncio.coroutine def call_async(self, ctxt, method, **kwargs): ... to RPCClient. Perhaps we'd need to add an AsyncRPCClient in a separate module and only add the method there - I don't have a good sense of it yet. However, the key thing is that I don't anticipate us needing to change the current API in a backwards incompatible way. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
On Mon, 2014-07-07 at 18:11 +, Angus Salkeld wrote: On 03/07/14 05:30, Mark McLoughlin wrote: Hey This is an attempt to summarize a really useful discussion that Victor, Flavio and I have been having today. At the bottom are some background links - basically what I have open in my browser right now thinking through all of this. We're attempting to take baby-steps towards moving completely from eventlet to asyncio/trollius. The thinking is for Ceilometer to be the first victim. Has this been widely agreed on? It seems to me like we are mixing two issues: 1) we need to move to py3 2) some people want to move from eventlet (I am not convinced that the volume of code changes warrants the end goal - and review load) To achieve 1) in a lower risk change, shouldn't we rather run eventlet on top of asyncio? - i.e. not require widespread code changes. So we can maintain the main loop API but move to py3. I am not sure on the feasibility, but seems to me like a more contained change. Right - it's important that we see these orthogonal questions, particularly now that it appears eventlet is likely to be available for Python 3 soon. For example, if it was generally agreed that we all want to end up on Python 3 with asyncio in the long term, you could imagine deploying (picking random examples) Glance with Python 3 and eventlet, but Ceilometer with Python 2 and asyncio/trollius. However, I don't have a good handle on how your suggestion of switching to the asyncio event loop without widespread code changes would work? Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo][all] autosync incubator to projects
On Fri, 2014-07-04 at 15:31 +0200, Ihar Hrachyshka wrote: Hi all, at the moment we have several bot jobs that sync contents to affected projects: - translations are copied from transifex; - requirements are copied from global requirements repo. We have another source of common code - oslo-incubator, though we still rely on people manually copying the new code from there to affected projects. This results in old, buggy, and sometimes completely different versions of the same code in all projects. I wonder why don't we set another bot to sync code from incubator? In that way, we would: - reduce work to do for developers [I hope everyone knows how boring it is to fill in commit message with all commits synchronized and create sync requests for 10 projects at once]; - make sure all projects use (almost) the same code; - ensure projects are notified in advance in case API changed in one of the modules that resulted in failures in gate; - our LOC statistics will be a bit more fair ;) (currently, the one who syncs a large piece of code from incubator to a project, gets all the LOC credit at e.g. stackalytics.com). The changes will still be gated, so any failures and incompatibilities will be caught. I even don't expect most of sync requests to fail at all, meaning it will be just a matter of two +2's from cores. I know that Oslo team works hard to graduate lots of modules from incubator to separate libraries with stable API. Still, I guess we'll live with incubator at least another cycle or two. What are your thoughts on that? Just repeating what I said on IRC ... The point of oslo-incubator is that it's a place where APIs can be cleaned up so that they are ready for graduation. Code living in oslo-incubator for a long time with unchanging APIs is not the idea. An automated sync job would IMHO discourage API cleanup work. I'd expect people would start adding lots of ugly backwards API compat hacks with their API cleanups just to stop people complaining about failing auto-syncs. That would be the opposite of what we're trying to achieve. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [oslo] Asyncio and oslo.messaging
Hey This is an attempt to summarize a really useful discussion that Victor, Flavio and I have been having today. At the bottom are some background links - basically what I have open in my browser right now thinking through all of this. We're attempting to take baby-steps towards moving completely from eventlet to asyncio/trollius. The thinking is for Ceilometer to be the first victim. Ceilometer's code is run in response to various I/O events like REST API requests, RPC calls, notifications received, etc. We eventually want the asyncio event loop to be what schedules Ceilometer's code in response to these events. Right now, it is eventlet doing that. Now, because we're using eventlet, the code that is run in response to these events looks like synchronous code that makes a bunch of synchronous calls. For example, the code might do some_sync_op() and that will cause a context switch to a different greenthread (within the same native thread) where we might handle another I/O event (like a REST API request) while we're waiting for some_sync_op() to return: def foo(self): result = some_sync_op() # this may yield to another greenlet return do_stuff(result) Eventlet's infamous monkey patching is what make this magic happen. When we switch to asyncio's event loop, all of this code needs to be ported to asyncio's explicitly asynchronous approach. We might do: @asyncio.coroutine def foo(self): result = yield from some_async_op(...) return do_stuff(result) or: @asyncio.coroutine def foo(self): fut = Future() some_async_op(callback=fut.set_result) ... result = yield from fut return do_stuff(result) Porting from eventlet's implicit async approach to asyncio's explicit async API will be seriously time consuming and we need to be able to do it piece-by-piece. The question then becomes what do we need to do in order to port a single oslo.messaging RPC endpoint method in Ceilometer to asyncio's explicit async approach? The plan is: - we stick with eventlet; everything gets monkey patched as normal - we register the greenio event loop with asyncio - this means that e.g. when you schedule an asyncio coroutine, greenio runs it in a greenlet using eventlet's event loop - oslo.messaging will need a new variant of eventlet executor which knows how to dispatch an asyncio coroutine. For example: while True: incoming = self.listener.poll() method = dispatcher.get_endpoint_method(incoming) if asyncio.iscoroutinefunc(method): result = method() self._greenpool.spawn_n(incoming.reply, result) else: self._greenpool.spawn_n(method) it's important that even with a coroutine endpoint method, we send the reply in a greenthread so that the dispatch greenthread doesn't get blocked if the incoming.reply() call causes a greenlet context switch - when all of ceilometer has been ported over to asyncio coroutines, we can stop monkey patching, stop using greenio and switch to the asyncio event loop - when we make this change, we'll want a completely native asyncio oslo.messaging executor. Unless the oslo.messaging drivers support asyncio themselves, that executor will probably need a separate native thread to poll for messages and send replies. If you're confused, that's normal. We had to take several breaks to get even this far because our brains kept getting fried. HTH, Mark. Victor's excellent docs on asyncio and trollius: https://docs.python.org/3/library/asyncio.html http://trollius.readthedocs.org/ Victor's proposed asyncio executor: https://review.openstack.org/70948 The case for adopting asyncio in OpenStack: https://wiki.openstack.org/wiki/Oslo/blueprints/asyncio A previous email I wrote about an asyncio executor: http://lists.openstack.org/pipermail/openstack-dev/2013-June/009934.html The mock-up of an asyncio executor I wrote: https://github.com/markmc/oslo-incubator/blob/8509b8b/openstack/common/messaging/_executors/impl_tulip.py My blog post on async I/O and Python: http://blogs.gnome.org/markmc/2013/06/04/async-io-and-python/ greenio - greelets support for asyncio: https://github.com/1st1/greenio/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [infra][oslo][neutron] Need help getting oslo.messaging 1.4.0.0a2 in global requirements
On Mon, 2014-06-30 at 16:52 +, Paul Michali (pcm) wrote: I have out for review 103536 to add this version to global requirements, so that Neutron has an oslo fix (review 102909) for encoding failure, which affects some gate runs. This review for global requirements is failing requirements check (http://logs.openstack.org/36/103536/1/check/check-requirements-integration-dsvm/6d9581c/console.html#_2014-06-30_12_34_56_921). I did a recheck bug 1334898, but see the same error, with the release not found, even though it is in PyPI. Infra folks say this is a known issue with pushing out pre-releases. Do we have a work-around? Any proposed solution to try? That makes two oslo alpha releases which are failing openstack/requirements checks: https://review.openstack.org/103256 https://review.openstack.org/103536 and an issue with the py27 stable/icehouse test jobs seemingly pulling in oslo.messaging 1.4.0.0a2: http://lists.openstack.org/pipermail/openstack-dev/2014-June/039021.html and these comments on IRC: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2014-06-30.log 2014-06-30T15:27:33 pcm__ hi. Need help with getting latest oslo.messaging release added to global requirements. Can someone advise on the issues I see. 2014-06-30T15:28:06 mordred pcm__: there are issues adding oslo pre-releases to the mirror right now - we're working on a solution ... so you're not alone at least :) 2014-06-30T15:29:02 pcm__ mordred: Jenkins failed saying that it could not find the release, but it is available. 2014-06-30T15:29:31 bknudson pcm__: mordred: is the fix to remove the check for --no-use-wheel in the check-requirements-integration-dsvm ? 2014-06-30T15:29:55 mordred bknudson: nope. it's to completely change our mirroring infrastructure :) Presumably there's more information somewhere on what solution infra are working on, but that's all I got ... We knew this pre-release-with-wheels stuff was going to be a little rocky, so this isn't surprising. Hopefully it'll get sorted out soon. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OpenStack-Dev] OSLO messaging update and icehouse config files
On Mon, 2014-06-30 at 15:35 -0600, John Griffith wrote: On Mon, Jun 30, 2014 at 3:17 PM, Mark McLoughlin mar...@redhat.com wrote: On Mon, 2014-06-30 at 12:04 -0600, John Griffith wrote: Hey Everyone, So I sent a note out yesterday asking about config changes brought in to Icehouse due to the OSLO Messaging update that went out over the week-end here. My initial email prior to realizing the update that caused the problem was OSLO Messaging update here [1]. (Periodic reminder that Oslo is not an acronym) In the meantime I tried updating the cinder.conf sample in Cinder's stable Icehouse branch, but noticed that py26 doesn't seem to pick up the changes when running the oslo conf generation tools against oslo.messaging. I haven't spent any time digging into this yet, was hoping that perhaps somebody from the OSLO team or somewhere else maybe had some insight as to what's going on here. Here's the patch I submitted that shows the failure on py26 and success on py27 [2]. I'll get around to this eventually if nobody else knows anything off the top of their head. Thanks, John [1]: http://lists.openstack.org/pipermail/openstack-dev/2014-June/038926.html [2]: https://review.openstack.org/#/c/103426/ Ok, that new cinder.conf.sample is showing changes caused by these oslo.messaging changes: https://review.openstack.org/101583 https://review.openstack.org/99291 Both of those changes were first released in 1.4.0.0a1 which is an alpha version targeting Juno and are not available in the 1.3.0 Icehouse version - i.e. 1.4.0.0a1 should not be used with stable/icehouse Cinder. It seems 1.3.0 *is* being used: http://logs.openstack.org/26/103426/1/check/gate-cinder-python26/5c6c1dd/console.html 2014-06-29 19:17:50.154 | oslo.messaging==1.3.0 and the output is just confusing: 2014-06-29 19:17:49.900 | --- /tmp/cinder.UtGHjm/cinder.conf.sample 2014-06-29 19:17:50.270071741 + 2014-06-29 19:17:49.900 | +++ etc/cinder/cinder.conf.sample 2014-06-29 19:10:48.396072037 + ... 2014-06-29 19:17:49.903 | +[matchmaker_redis] 2014-06-29 19:17:49.903 | + i.e. it's showing that the file you proposed was generated with 1.4.0.0a1 and the file generated during the test job was generated with 1.3.0. Which is what I'd expect - the update you proposed is not appropriate for stable/icehouse. So why is the py27 job passing? http://logs.openstack.org/26/103426/1/check/gate-cinder-python27/7844c61/console.html 2014-06-29 19:21:12.875 | oslo.messaging==1.4.0.0a2 That's the problem right there - 1.4.0.0a2 should not be getting installed on the stable/icehouse branch. I'm not sure why it is. Someone on #openstack-infra could probably help figure it out. Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Thanks Mark... so the problem is Oslo messaging in requirements is = in stable/icehouse [1](please note I used Oslo not OSLO). [1]: https://github.com/openstack/requirements/blob/stable/icehouse/global-requirements.txt#L49 Thanks for pointing me in the right direction. Ah, yes! This is the problem: oslo.messaging=1.3.0a9 This essentially allows *any* alpha release of oslo.messaging to be used. We should change stable/icehouse to simply be: oslo.messaging=1.3.0 I'm happy to do that tomorrow, but I suspect you'll get there first Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] oslo.messaging 1.4.0.0a1 released
On Fri, 2014-06-27 at 13:28 +, Paul Michali (pcm) wrote: Mark, When would we be able to get a release of Oslo with 102909 fix in? It’s preventing Jenkins passing for some commits in Neutron. I've just pushed 1.4.0.0a2 with the following changes: 244a902 Fix the notifier example a7f01d9 Fix slow notification listener tests da2abaa Fix formatting of TransportURL.parse() docs 13fc9f2 Fix info method of ListenerSetupMixin 0cfafac encoding error in file 0102aa9 Replace usage of str() with six.text_type Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] 'retry' option
On Fri, 2014-06-27 at 17:02 +0100, Gordon Sim wrote: A question about the new 'retry' option. The doc says: By default, cast() and call() will block until the message is successfully sent. What does 'successfully sent' mean here? Unclear, ambiguous, probably driver dependent etc. The 'blocking' we're talking about here is establishing a connection with the broker. If the connection has been lost, then cast() will block until the connection has been re-established and the message 'sent'. Does it mean 'written to the wire' or 'accepted by the broker'? For the impl_qpid.py driver, each send is synchronous, so it means accepted by the broker[1]. What does the impl_rabbit.py driver do? Does it just mean 'written to the wire', or is it using RabbitMQ confirmations to get notified when the broker accepts it (standard 0-9-1 has no way of doing this). I don't know, but it would be nice if someone did take the time to figure it out and document it :) Seriously, some docs around the subtle ways that the drivers differ from one another would be helpful ... particularly if it exposed incorrect assumptions API users are currently making. If the intention is to block until accepted by the broker that has obvious performance implications. On the other hand if it means block until written to the wire, what is the advantage of that? Was that a deliberate feature or perhaps just an accident of implementation? The use case for the new parameter, as described in the git commit, seems to be motivated by wanting to avoid the blocking when sending notifications. I can certainly understand that desire. However, notifications and casts feel like inherently asynchronous things to me, and perhaps having/needing the synchronous behaviour is the real issue? It's not so much about sync vs async, but a failure mode. By default, if we lose our connection with the broker, we wait until we can re-establish it rather than throwing exceptions (requiring the API caller to have its own retry logic) or quietly dropping the message. The use case for ceilometer is to allow its RPCPublisher to have a publishing policy - block until the samples have been sent, queue (in an in-memory, fixed-length queue) if we don't have a connection to the broker, or drop it if we don't have a connection to the broker. https://review.openstack.org/77845 I do understand the ambiguity around what message delivery guarantees are implicit in cast() isn't ideal, but that's not what adding this 'retry' parameter was about. Calls by contrast, are inherently synchronous, but at present the retry controls only the sending of the request. If the server fails, the call may timeout regardless of the value of 'retry'. Just in passing, I'd suggest that renaming the new parameter max_reconnects, would make it's current behaviour and values clearer. The name 'retry' sounds like a yes/no type value, and retry=0 v. retry=1 is the reverse of what I would intuitively expect. Sounds reasonable. Would you like to submit a patch? Quick turnaround is important, because if Ceilometer starts using this retry parameter before we rename it, I'm not sure it'll be worth the hassle. Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [hacking] community consensus and removing rules
On Mon, 2014-06-23 at 19:55 -0700, Joe Gordon wrote: * Add a new directory, contrib, for local rules that multiple projects use but are not generally considered acceptable to be enabled by default. This way we can reduce the amount of cut and pasted code (thank you to Ben Nemec for this idea). All sounds good to me, apart from a pet peeve on 'contrib' directories. What does 'contrib' mean? 'contributed'? What exactly *isn't* contributed? Often it has connotations of 'contributed by outsiders'. It also often has connotations of 'bucket for crap', 'unmaintained and untested', YMMV, etc. etc. Often the name is just chosen out of laziness - I can't think of a good name for this, and projects often have a contrib directory with random stuff in it, so that works. Let's be precise - these are optional rules, right? How about calling the directory 'optional'? Say no to contrib directories! :-P Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [hacking] rules for removal
On Tue, 2014-06-24 at 09:51 -0700, Clint Byrum wrote: Excerpts from Monty Taylor's message of 2014-06-24 06:48:06 -0700: On 06/22/2014 02:49 PM, Duncan Thomas wrote: On 22 June 2014 14:41, Amrith Kumar amr...@tesora.com wrote: In addition to making changes to the hacking rules, why don't we mandate also that perceived problems in the commit message shall not be an acceptable reason to -1 a change. -1. There are some /really/ bad commit messages out there, and some of us try to use the commit messages to usefully sort through the changes (i.e. I often -1 in cinder a change only affects one driver and that isn't clear from the summary). If the perceived problem is grammatical, I'm a bit more on board with it not a reason to rev a patch, but core reviewers can +2/A over the top of a -1 anyway... 100% agree. Spelling and grammar are rude to review on - especially since we have (and want) a LOT of non-native English speakers. It's not our job to teach people better grammar. Heck - we have people from different English backgrounds with differing disagreements on what good grammar _IS_ We shouldn't quibble over _anything_ grammatical in a commit message. If there is a disagreement about it, the comments should be ignored. There are definitely a few grammar rules that are loose and those should be largely ignored. However, we should correct grammar when there is a clear solution, as those same people who do not speak English as their first language are likely to be confused by poor grammar. We're not doing it to teach grammar. We're doing it to ensure readability. The importance of clear English varies with context, but commit messages are a place where we should try hard to just let it go, particularly with those who do not speak English as their first language. Commit messages stick around forever and it's important that they are useful, but they will be read by a small number of people who are going to be in a position to spend a small amount of time getting over whatever dissonance is caused by a typo or imperfect grammar. I think specs are pretty similar and don't warrant much additional grammar nitpicking. Sure, they're longer pieces of text and slightly more people will rely on them for information, but they're not intended to be complete documentation. Where grammar is so poor that readers would be easily misled in important ways, then sure that should be fixed. But there comes a point when we're no longer working to avoid confusion and instead just being pendants. Taking issue[1] with this: whatever scaling mechanism Heat and we end up going with. because it has a dangling preposition is an example of going way beyond the point of productive pedantry IMHO :-) Mark. [1] - https://review.openstack.org/#/c/97939/5/specs/juno/remove-mergepy.rst ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [hacking] rules for removal
On Tue, 2014-06-24 at 13:56 -0700, Clint Byrum wrote: Excerpts from Mark McLoughlin's message of 2014-06-24 12:49:52 -0700: On Tue, 2014-06-24 at 09:51 -0700, Clint Byrum wrote: Excerpts from Monty Taylor's message of 2014-06-24 06:48:06 -0700: On 06/22/2014 02:49 PM, Duncan Thomas wrote: On 22 June 2014 14:41, Amrith Kumar amr...@tesora.com wrote: In addition to making changes to the hacking rules, why don't we mandate also that perceived problems in the commit message shall not be an acceptable reason to -1 a change. -1. There are some /really/ bad commit messages out there, and some of us try to use the commit messages to usefully sort through the changes (i.e. I often -1 in cinder a change only affects one driver and that isn't clear from the summary). If the perceived problem is grammatical, I'm a bit more on board with it not a reason to rev a patch, but core reviewers can +2/A over the top of a -1 anyway... 100% agree. Spelling and grammar are rude to review on - especially since we have (and want) a LOT of non-native English speakers. It's not our job to teach people better grammar. Heck - we have people from different English backgrounds with differing disagreements on what good grammar _IS_ We shouldn't quibble over _anything_ grammatical in a commit message. If there is a disagreement about it, the comments should be ignored. There are definitely a few grammar rules that are loose and those should be largely ignored. However, we should correct grammar when there is a clear solution, as those same people who do not speak English as their first language are likely to be confused by poor grammar. We're not doing it to teach grammar. We're doing it to ensure readability. The importance of clear English varies with context, but commit messages are a place where we should try hard to just let it go, particularly with those who do not speak English as their first language. Commit messages stick around forever and it's important that they are useful, but they will be read by a small number of people who are going to be in a position to spend a small amount of time getting over whatever dissonance is caused by a typo or imperfect grammar. The times that one is reading git messages are often the most stressful such as when a regression has occurred in production. Given that, I believe it is entirely worth it to me that the commit messages on my patches are accurate and understandable. I embrace all feedback which leads to them being more clear. I will of course stand back from grammar correcting and not block patches if there are many who disagree. I think specs are pretty similar and don't warrant much additional grammar nitpicking. Sure, they're longer pieces of text and slightly more people will rely on them for information, but they're not intended to be complete documentation. Disagree. I will only state this one more time as I think everyone knows how I feel: if we are going to grow beyond the english-as-a-first-language world we simply cannot assume that those reading specs will be native speakers. Good spelling and grammar helps us grow. Bad spelling and grammar holds us back. There's two sides to this coin - concern about alienating non-english-as-a-first-language speakers who feel undervalued because their language is nitpicked to death and concern about alienating english-as-a-first-language speakers who struggle to understand unclear or incorrect language. Obviously there's a balance to be struck there and different people will judge that differently, but I'm personally far more concerned about the former rather than the latter case. I expect many beyond the english-as-a-first-language world are pretty used to dealing with imperfect language but aren't so delighted with being constantly reminded that their use language is imperfect. Where grammar is so poor that readers would be easily misled in important ways, then sure that should be fixed. But there comes a point when we're no longer working to avoid confusion and instead just being pendants. Taking issue[1] with this: whatever scaling mechanism Heat and we end up going with. because it has a dangling preposition is an example of going way beyond the point of productive pedantry IMHO :-) I actually agree that it would not at all be a reason to block a patch. However, there is some ambiguity in that sentence that may not be clear to a native speaker. It is not 100% clear if we are going with Heat, or with the scaling mechanism. That is the only reason for the dangling preposition debate. I'd wager you'd seriously struggle to find anyone who would interpret that sentence as we are going with Heat, even if they were non-english-as-a-first-language speakers who had never heard of OpenStack or
Re: [openstack-dev] [hacking] rules for removal
On Sat, 2014-06-21 at 07:36 -0700, Clint Byrum wrote: Excerpts from Sean Dague's message of 2014-06-21 05:08:01 -0700: Pedantic reviewers that are reviewing for this kind of thing only should be scorned. I realistically like the idea markmc came up with - https://twitter.com/markmc_/status/480073387600269312 I also agree it is really fun to think about shaming those annoying actions. It is also not fun _at all_ to be publicly shamed. In fact I'd say it is at least an order of magnitude less fun. There is an old saying, praise in public, punish in private. It is one reason the -1 comments I give always include praise for whatever is right for new contributors. Not everyone is a grizzled veteran. It is far more interesting to me to solve the grouping problem in a way that works for us long term (python 2 and 3) than it is to develop a culture that builds any of its core activities on negative emotional feedback. That's not to say we can't say hey you're doing it wrong. I mean to say that direct feedback like that belongs in private IRC messages or email, not in public everyone can see that reviews. Give people a chance to save face. Meanwhile, the less we have to have one on one negative feedback, the easier the job of reviewers is. The last thing we want to do is have more reasons for people to NOT do reviews. You're right that something like I suggested could easily lead to more negative energy in the project, not less. What I had in mind was that we could laugh at ourselves about this. Assuming that the reviewers called out would be fully on-board and willing to laugh along at being the most pedantic nerd of the week. Yeah, that's probably wishful thinking. Maybe it could be anonymous. Maybe instead it could be a weekly mailing list discussion so that we could all discuss as a community whether that kind of feedback on a review is appropriate. The main point is that this is something worth addressing as a wider community rather than in individual reviews with a limited audience. And that doing it with a bit of humor might help take the sting out of it. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Octavia] PTL and core team members
On Thu, 2014-06-19 at 20:36 -0700, Dustin Lundquist wrote: Dolph, I appreciate the suggestion. In the mean time how does the review process work without core developers to approve gerrit submissions? If you're just getting started, have a small number (possibly just 1 to begin with) of developers collaborate closely, with the minimum possible process and then use that list of developers as your core review team when you gradually start adopting some process. Aim to get from zero to bootstrapped with that core team in a small number of weeks at most. Minimum possible process could mean a git repo anywhere that those initial developers have direct push access to. You could use stackforge from the beginning and the developers just approve their own changes, but that's a bit annoying. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Oslo] Translating log and exception messages in Oslo libraries
Hi I'm not sure we've ever discussed this before, but I had previously figured that we shouldn't translate log and exception messages in oslo.messaging. My thinking is: - it seems like an odd thing for a library to do, I don't know of examples of other libraries doing this .. but I haven't gone looking - it involves a dependency on oslo.i18n - more than just marking strings for translation and using gettextutils, you also need to set up the infrastructure for pushing the .pot files to transifex, pulling the .po files from .transifex and installing the .mo files at install time I don't feel terribly strongly about this except that unless someone is willing to see this through and do the transifex and install-time work, we shouldn't be doing the use-oslo.i18n and mark-strings-for-translation work. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] A modest proposal to reduce reviewer load
Hi Armando, On Tue, 2014-06-17 at 14:51 +0200, Armando M. wrote: I wonder what the turnaround of trivial patches actually is, I bet you it's very very small, and as Daniel said, the human burden is rather minimal (I would be more concerned about slowing them down in the gate, but I digress). I think that introducing a two-tier level for patch approval can only mitigate the problem, but I wonder if we'd need to go a lot further, and rather figure out a way to borrow concepts from queueing theory so that they can be applied in the context of Gerrit. For instance Little's law [1] says: The long-term average number of customers (in this context reviews) in a stable system L is equal to the long-term average effective arrival rate, λ, multiplied by the average time a customer spends in the system, W; or expressed algebraically: L = λW. L can be used to determine the number of core reviewers that a project will need at any given time, in order to meet a certain arrival rate and average time spent in the queue. If the number of core reviewers is a lot less than L then that core team is understaffed and will need to increase. If we figured out how to model and measure Gerrit as a queuing system, then we could improve its performance a lot more effectively; for instance, this idea of privileging trivial patches over longer patches has roots in a popular scheduling policy [3] for M/G/1 queues, but that does not really help aging of 'longer service time' patches and does not have a preemption mechanism built-in to avoid starvation. Just a crazy opinion... Armando [1] - http://en.wikipedia.org/wiki/Little's_law [2] - http://en.wikipedia.org/wiki/Shortest_job_first [3] - http://en.wikipedia.org/wiki/M/G/1_queue This isn't crazy at all. We do have a problem that surely could be studied and solved/improved by applying queueing theory or lessons from fields like lean manufacturing. Right now, we're simply applying our intuition and the little I've read about these sorts of problems is that your intuition can easily take you down the wrong path. There's a bunch of things that occur just glancing through those articles: - Do we have an unstable system? Would it be useful to have arrival and exit rate metrics to help highlight this? Over what time period would those rates need to be averaged to be useful? Daily, weekly, monthly, an entire release cycle? - What are we trying to optimize for? The length of time in the queue? The number of patches waiting in the queue? The response time to a new patch revision? - We have a single queue, with a bunch of service nodes with a wide variance between their service rates, very little in the way of scheduling policy, a huge rate of service nodes sending jobs back for rework, a cost associated with maintaining a job while it sits in the queue, the tendency for some jobs to disrupt many other jobs with merge conflicts ... not simple. - Is there any sort of natural limit in our queue size that makes the system stable - e.g. do people naturally just stop submitting patches at some point? My intuition on all of this lately is that we need some way to model and experiment with this queue, and I think we could make some interesting progress if we could turn it into a queueing network rather than a single, extremely complex queue. Say we had a front-end for gerrit which tracked which queue a patch is in, we could experiment with things like: - a triage queue, with non-cores signed up as triagers looking for obvious mistakes and choosing the next queue for a patch to enter into - queues having a small number of cores signed up as owners - e.g. high priority bugfix, API, scheduler, object conversion, libvirt driver, vmware driver, etc. - we'd allow for a large number of queues so that cores could aim for an inbox zero approach on individual queues, something that would probably help keep cores motivated. - we could apply different scheduling policies to each of the different queues - i.e. explicit guidance for cores about which patches they should pick off the queue next. - we could track metrics on individual queues as well as the whole network, identifying bottlenecks and properly recognizing which reviewers are doing a small number of difficult reviews versus those doing a high number of trivial reviews. - we could require some queues to feed into a final approval queue where some people are responsible for giving an approved patch a final sanity check - i.e. there would be a class of reviewer with good instincts who quickly churn through already-reviewed patches looking for the kind of mistakes people tend to mistake when they're down in the weeds. - explicit queues for large, cross-cutting changes like coding style changes. Perhaps we could stop servicing these queues
[openstack-dev] [oslo] Paris mid-cycle sprint
Hey I had been thinking of going to the Paris sprint: https://wiki.openstack.org/wiki/Sprints/ParisJuno2014 But it only just occurred to me that we could have enough Oslo contributors in Europe to make it worthwhile for us to use the opportunity to get some Oslo stuff done together. For example, Victor (Stinner), Mehdi, Flavio, Victor (Sergeyev), Roman, or others ... perhaps some or all of you would be up for it? Julien will be there too, but will want to focus on Ceilometer I assume. I'll add myself to the wiki ... feel free to do so too. Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] A modest proposal to reduce reviewer load
On Thu, 2014-06-19 at 09:34 +0100, Matthew Booth wrote: On 19/06/14 08:32, Mark McLoughlin wrote: Hi Armando, On Tue, 2014-06-17 at 14:51 +0200, Armando M. wrote: I wonder what the turnaround of trivial patches actually is, I bet you it's very very small, and as Daniel said, the human burden is rather minimal (I would be more concerned about slowing them down in the gate, but I digress). I think that introducing a two-tier level for patch approval can only mitigate the problem, but I wonder if we'd need to go a lot further, and rather figure out a way to borrow concepts from queueing theory so that they can be applied in the context of Gerrit. For instance Little's law [1] says: The long-term average number of customers (in this context reviews) in a stable system L is equal to the long-term average effective arrival rate, λ, multiplied by the average time a customer spends in the system, W; or expressed algebraically: L = λW. L can be used to determine the number of core reviewers that a project will need at any given time, in order to meet a certain arrival rate and average time spent in the queue. If the number of core reviewers is a lot less than L then that core team is understaffed and will need to increase. If we figured out how to model and measure Gerrit as a queuing system, then we could improve its performance a lot more effectively; for instance, this idea of privileging trivial patches over longer patches has roots in a popular scheduling policy [3] for M/G/1 queues, but that does not really help aging of 'longer service time' patches and does not have a preemption mechanism built-in to avoid starvation. Just a crazy opinion... Armando [1] - http://en.wikipedia.org/wiki/Little's_law [2] - http://en.wikipedia.org/wiki/Shortest_job_first [3] - http://en.wikipedia.org/wiki/M/G/1_queue This isn't crazy at all. We do have a problem that surely could be studied and solved/improved by applying queueing theory or lessons from fields like lean manufacturing. Right now, we're simply applying our intuition and the little I've read about these sorts of problems is that your intuition can easily take you down the wrong path. There's a bunch of things that occur just glancing through those articles: - Do we have an unstable system? Would it be useful to have arrival and exit rate metrics to help highlight this? Over what time period would those rates need to be averaged to be useful? Daily, weekly, monthly, an entire release cycle? - What are we trying to optimize for? The length of time in the queue? The number of patches waiting in the queue? The response time to a new patch revision? - We have a single queue, with a bunch of service nodes with a wide variance between their service rates, very little in the way of scheduling policy, a huge rate of service nodes sending jobs back for rework, a cost associated with maintaining a job while it sits in the queue, the tendency for some jobs to disrupt many other jobs with merge conflicts ... not simple. - Is there any sort of natural limit in our queue size that makes the system stable - e.g. do people naturally just stop submitting patches at some point? My intuition on all of this lately is that we need some way to model and experiment with this queue, and I think we could make some interesting progress if we could turn it into a queueing network rather than a single, extremely complex queue. Say we had a front-end for gerrit which tracked which queue a patch is in, we could experiment with things like: - a triage queue, with non-cores signed up as triagers looking for obvious mistakes and choosing the next queue for a patch to enter into - queues having a small number of cores signed up as owners - e.g. high priority bugfix, API, scheduler, object conversion, libvirt driver, vmware driver, etc. - we'd allow for a large number of queues so that cores could aim for an inbox zero approach on individual queues, something that would probably help keep cores motivated. - we could apply different scheduling policies to each of the different queues - i.e. explicit guidance for cores about which patches they should pick off the queue next. - we could track metrics on individual queues as well as the whole network, identifying bottlenecks and properly recognizing which reviewers are doing a small number of difficult reviews versus those doing a high number of trivial reviews. - we could require some queues to feed into a final approval queue where some people are responsible for giving an approved patch a final sanity check - i.e. there would be a class of reviewer with good instincts who quickly churn
Re: [openstack-dev] [devstack] [zmq] [oslo.messaging] Running devstack with zeromq
On Thu, 2014-06-19 at 14:29 +0200, Mehdi Abaakouk wrote: Hi, Le 2014-06-19 00:30, Ben Nemec a écrit : On 06/18/2014 05:45 AM, Elena Ezhova wrote: So I wonder whether it is something the community is interested in and, if yes, are there any recommendations concerning possible implementation? I can't speak to the specific implementation, but if we're going to keep the zmq driver in oslo.messaging then IMHO it should be usable with devstack, so +1 to making that work. Currently the zmq driver have a really bad test coverage, the driver is 'I think' broken since a while. Bugs like [1] or [2] let me think that nobody can use it currently. [1] https://bugs.launchpad.net/oslo.messaging/+bug/1301723 [2] https://bugs.launchpad.net/oslo.messaging/+bug/1330460 Also, an oslo.messaging rule is a driver must not force to use a eventloop library, but this one heavily use eventlet. So only the eventlet executor can works with it, not the blocking one or any future executor. I guess if someone is interested in, the first step is to fix the zmq driver, remove eventlet stuffs and write unit tests for it, before trying integration, and raise bugs that should be catch by unit/functionnal testing. If nobody is interested in zmq, perhaps we should just drop/deprecated/mark_as_broken it. Yes, I agree with all of that. Unless the situation improves rapidly, I think we should mark it as deprecated in Juno and plan to remove it in K. That might seem like an overly rapid deprecation cycle, but it is currently broken and unusable in Icehouse ... so no-one can be using it in Icehouse. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [glance] Unifying configuration file
Hey On Tue, 2014-06-17 at 17:43 +0200, Julien Danjou wrote: On Tue, Jun 17 2014, Arnaud Legendre wrote: @ZhiYan: I don't like the idea of removing the sample configuration file(s) from the git repository. Many people do not want to have to checkout the entire codebase and tox every time they have to verify a variable name in a configuration file. I know many people who were really frustrated where they realized that the sample config file was gone from the Nova repo. However, I agree with the fact that it would be better if the sample was 100% accurate: so the way I would love to see this working is to generate the sample file every time there is a config change (this being totally automated (maybe at the gate level...)). You're a bit late on this. :) So what I did these last months (year?) in several project, is to check at gate time the configuration file that is automatically generated against what's in the patches. That turned out to be a real problem because sometimes some options changes from the eternal module we rely on (e.g. keystone authtoken or oslo.messaging). In the end many projects (like Nova) disabled this check altogether, and therefore removed the generated configuration file From the git repository. For those that casually want to refer to the sample config, what would help if there was Jenkins jobs to publish the generated sample config file somewhere. For people installing the software, it would probably be nice if pbr added 'python setup.py sample_config' or something. @Julien: I would be interested to understand the value that you see of having only one config file? At this point, I don't see why managing one file is more complicated than managing several files especially when they are organized by categories. Also, scrolling through the registry settings every time I want to modify an api setting seem to add some overhead. Because there's no way to automatically generate several configuration files with each its own set of options using oslo.config. I think that's a failing of oslo.config, though. Glance's layout of config files is useful and intuitive. Glance is (one of?) the last project in OpenStack to manually write its sample configuration file, which are not up to date obviously. Neutron too, but not split out per-service. I don't find Neutron's config file layout as intuitive. So really this is mainly about following what every other projects did the last year(s). There's a balance here between what makes technical sense and what helps users. If Glance has support for generating a unified config file while also manually maintaining the split configs, I think that's a fine compromise. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [glance] Unifying configuration file
On Wed, 2014-06-18 at 09:29 -0400, Doug Hellmann wrote: On Wed, Jun 18, 2014 at 1:58 AM, Mark McLoughlin mar...@redhat.com wrote: Hey On Tue, 2014-06-17 at 17:43 +0200, Julien Danjou wrote: On Tue, Jun 17 2014, Arnaud Legendre wrote: @Julien: I would be interested to understand the value that you see of having only one config file? At this point, I don't see why managing one file is more complicated than managing several files especially when they are organized by categories. Also, scrolling through the registry settings every time I want to modify an api setting seem to add some overhead. Because there's no way to automatically generate several configuration files with each its own set of options using oslo.config. I think that's a failing of oslo.config, though. Glance's layout of config files is useful and intuitive. The config generator lets you specify the modules, libraries, and files to be used to generate a config file. It even has a way to specify which files to ignore. So I think we have everything we need in the config generator, but we need to run it more than once, with different inputs, to generate multiple files. Yep, except the magic way we troll through the code, loading modules, introspecting what config options were registered, etc. will likely make this a frustrating experience to get right. I took a little time to hack up a much more simple and explicit approach to config file generation and posted a draft here: https://review.openstack.org/100946 The docstring at the top of the file explains the approach: https://review.openstack.org/#/c/100946/1/oslo/config/generator.py Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] An alternative approach to enforcing expected election behaviour
On Mon, 2014-06-16 at 10:56 +0100, Daniel P. Berrange wrote: On Mon, Jun 16, 2014 at 05:04:51AM -0400, Eoghan Glynn wrote: How about we rely instead on the values and attributes that actually make our community strong? Specifically: maturity, honesty, and a self-correcting nature. How about we simply require that each candidate for a TC or PTL election gives a simple undertaking in their self-nomination mail, along the lines of: I undertake to respect the election process, as required by the community code of conduct. I also undertake not to engage in campaign practices that the community has considered objectionable in the past, including but not limited to, unsolicited mail shots and private campaign events. If my behavior during this election period does not live up to those standards, please feel free to call me out on it on this mailing list and/or withhold your vote. I like this proposal because it focuses on the carrot rather than the stick, which is ultimately better for community cohesiveness IMHO. I like it too. A slight tweak of that would be to require candidates to sign the pledge publicly via an online form. We could invite the community as a whole to sign it too in order to have candidates' supporters covered. It is already part of our community ethos that we can call people out to publically debate / stand up justify any all issues affecting the project whether they be related to the code, architecture, or non-technical issues such as electioneering behaviour. We then rely on: (a) the self-policing nature of an honest, open community and: (b) the maturity and sound judgement within that community giving us the ability to quickly spot and disregard any frivolous reports of mis-behavior So no need for heavy-weight inquisitions, no need to interrupt the election process, no need for handing out of stiff penalties such as termination of membership. Before jumping headlong for a big stick to whack people with, I think I'd expect to see examples of problems we've actually faced (as opposed to vague hypotheticals), and a clear illustration that a self-policing approach to the community interaction failed to address them. I've not personally seen/experianced any problems that are so severe that they'd suggest we need the ability to kick someone out of the community for sending email ! Indeed. This discussion is happening in a vacuum for many people who do not know the details of the private emails and private campaign events which happened in the previous cycle. The only one I know of first hand was a private email where the recipients quickly responded saying the email was out of line and the original sender apologized profusely. People can make mistakes in good faith and if we can deal with it quickly and maturely as a community, all the better. In this example, the sender's apology could have bee followed up with look, here's our code of conduct; sign it now, respect it in the future, and let that be the end of the matter. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo][messaging] messaging vs. messagingv2
Hi Ihar, On Mon, 2014-06-16 at 15:28 +0200, Ihar Hrachyshka wrote: Hi all, I'm currently pushing Neutron to oslo.messaging, and while at it, a question popped up. So in oslo-rpc, we have the following notification drivers available: neutron.openstack.common.notifier.log_notifier neutron.openstack.common.notifier.no_op_notifier neutron.openstack.common.notifier.rpc_notifier2 neutron.openstack.common.notifier.rpc_notifier neutron.openstack.common.notifier.test_notifier And in oslo.messaging, we have: oslo.messaging.notify._impl_log:LogDriver oslo.messaging.notify._impl_noop:NoOpDriver oslo.messaging.notify._impl_messaging:MessagingV2Driver oslo.messaging.notify._impl_messaging:MessagingDriver oslo.messaging.notify._impl_test:TestDriver My understanding is that they map to each other as in [1]. So atm Neutron uses rpc_notifier from oslo-rpc, so I'm going to replace it with MessagingDriver. So far so good. So far so good, indeed. But then I've checked docstrings for MessagingDriver and MessagingV2Driver [2], and the following looks suspicious to me. For MessagingDriver, it's said: This driver should only be used in cases where there are existing consumers deployed which do not support the 2.0 message format. This sounds like MessagingDriver is somehow obsolete, and we want to use MessagingV2Driver unless forced to. But I don't get what those consumers are. Are these other projects that interact with us via messaging bus? Another weird thing is that it seems that no other project is actually using MessagingV2Driver (at least those that I've checked). Is it even running in wild? The idea is that deployments should move over to the v2 on-the-wire format, but we've never made any great efforts for that to happen. Part of the issue here is that notifications are consumed by codebases outside of OpenStack and, so, changing the default to v2 would likely unnecessarily disrupt some people. The reason no-one has pushed very hard on a firm deprecation plan for the v1 format is that the v2 format doesn't yet offer a huge amount of advantages. Right now it just adds a '2.0' version number to the format. When we gain the ability to sign notification messages, this will only be available via the v2 format and that will encourage more focus on switching over fully. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Gate proposal - drop Postgresql configurations in the gate
On Fri, 2014-06-13 at 07:31 -0400, Sean Dague wrote: On 06/13/2014 02:36 AM, Mark McLoughlin wrote: On Thu, 2014-06-12 at 22:10 -0400, Dan Prince wrote: On Thu, 2014-06-12 at 08:06 -0400, Sean Dague wrote: We're definitely deep into capacity issues, so it's going to be time to start making tougher decisions about things we decide aren't different enough to bother testing on every commit. In order to save resources why not combine some of the jobs in different ways. So for example instead of: check-tempest-dsvm-full check-tempest-dsvm-postgres-full Couldn't we just drop the postgres-full job and run one of the Neutron jobs w/ postgres instead? Or something similar, so long as at least one of the jobs which runs most of Tempest is using PostgreSQL I think we'd be mostly fine. Not shooting for 100% coverage for everything with our limited resource pool is fine, lets just do the best we can. Ditto for gate jobs (not check). I think that's what Clark was suggesting in: https://etherpad.openstack.org/p/juno-test-maxtrices Previously we've been testing Postgresql in the gate because it has a stricter interpretation of SQL than MySQL. And when we didn't test Postgresql it regressed. I know, I chased it for about 4 weeks in grizzly. However Monty brought up a good point at Summit, that MySQL has a strict mode. That should actually enforce the same strictness. My proposal is that we land this change to devstack - https://review.openstack.org/#/c/97442/ and backport it to past devstack branches. Then we drop the pg jobs, as the differences between the 2 configs should then be very minimal. All the *actual* failures we've seen between the 2 were completely about this strict SQL mode interpretation. I suppose I would like to see us keep it in the mix. Running SmokeStack for almost 3 years I found many an issue dealing w/ PostgreSQL. I ran it concurrently with many of the other jobs and I too had limited resources (much less that what we have in infra today). Would MySQL strict SQL mode catch stuff like this (old bugs, but still valid for this topic I think): https://bugs.launchpad.net/nova/+bug/948066 https://bugs.launchpad.net/nova/+bug/1003756 Having support for and testing against at least 2 databases helps keep our SQL queries and migrations cleaner... and is generally a good practice given we have abstractions which are meant to support this sort of thing anyway (so by all means let us test them!). Also, Having compacted the Nova migrations 3 times now I found many issues by testing on multiple databases (MySQL and PostgreSQL). I'm quite certain our migrations would be worse off if we just tested against the single database. Certainly sounds like this testing is far beyond the might one day be useful level Sean talks about. The migration compaction is a good point. And I'm happy to see there were some bugs exposed as well. Here is where I remain stuck We are now at a failure rate in which it's 3 days (minimum) to land a fix that decreases our failure rate at all. The way we are currently solving this is by effectively building manual zuul and taking smart humans in coordination to end run around our system. We've merged 18 fixes so far - https://etherpad.openstack.org/p/gatetriage-june2014 this way. Merging a fix this way is at least an order of magnitude more expensive on people time because of the analysis and coordination we need to go through to make sure these things are the right things to jump the queue. That effort, over 8 days, has gotten us down to *only* a 24hr merge delay. And there are no more smoking guns. What's left is a ton of subtle things. I've got ~ 30 patches outstanding right now (a bunch are things to clarify what's going on in the build runs especially in the fail scenarios). Every single one of them has been failed by Jenkins at least once. Almost every one was failed by a different unique issue. So I'd say at best we're 25% of the way towards solving this. That being said, because of the deep queues, people are just recheck grinding (or hitting the jackpot and landing something through that then fails a lot after landing). That leads to bugs like this: https://bugs.launchpad.net/heat/+bug/1306029 Which was seen early in the patch - https://review.openstack.org/#/c/97569/ Then kind of destroyed us completely for a day - http://status.openstack.org/elastic-recheck/ (it's the top graph). And, predictably, a week into a long gate queue everyone is now grumpy. The sniping between projects, and within projects in assigning blame starts to spike at about day 4 of these events. Everyone assumes someone else is to blame for these things. So there is real community impact when we get to these states. So, I'm kind of burnt out trying to figure out how to get us out of this. As I do take
Re: [openstack-dev] [oslo] versioning and releases
On Thu, 2014-06-12 at 12:09 +0200, Thierry Carrez wrote: Doug Hellmann wrote: On Tue, Jun 10, 2014 at 5:19 PM, Mark McLoughlin mar...@redhat.com wrote: On Tue, 2014-06-10 at 12:24 -0400, Doug Hellmann wrote: [...] Background: We have two types of oslo libraries. Libraries like oslo.config and oslo.messaging were created by extracting incubated code, updating the public API, and packaging it. Libraries like cliff and taskflow were created as standalone packages from the beginning, and later adopted by the oslo team to manage their development and maintenance. Incubated libraries have been released at the end of a release cycle, as with the rest of the integrated packages. Adopted libraries have historically been released as needed during their development. We would like to synchronize these so that all oslo libraries are officially released with the rest of the software created by OpenStack developers. Could you outline the benefits of syncing with the integrated release ? Sure! http://lists.openstack.org/pipermail/openstack-dev/2012-November/003345.html :) Personally I see a few drawbacks to this approach: We dump the new version on consumers usually around RC time, which is generally a bad time to push a new version of a dependency and detect potential breakage. Consumers just seem to get the new version at the worst possible time. It also prevents from spreading the work all over the cycle. For example it may have been more successful to have the oslo.messaging new release by milestone-1 to make sure it's adopted by projects in milestone-2 or milestone-3... rather than have it ready by milestone-3 and expect all projects to use it by consuming alphas during the cycle. Now if *all* projects were continuously consuming alpha versions, most of those drawbacks would go away. Yes, that's the plan. Those issues are acknowledged and we're reasonably confident the alpha versions plan will address them. [...] Patch Releases: Updates to existing library releases can be made from stable branches. Checking out stable/icehouse of oslo.config for example would allow a release 1.3.1. We don't have a formal policy about whether we will create patch releases, or whether applications are better off using the latest release of the library. Do we need one? I'm not sure we need one, but if we did I'd expect them to be aligned with stable releases. Right now, I think they'd just be as-needed - if there's enough backported to the stable branch to warrant a release, we just cut one. That's pretty much what I thought, too. We shouldn't need to worry about alphas for patch releases, since we won't add features. Yes, I think we can be pretty flexible about it. But to come back to my above remark... should it be stable/icehouse or stable/1.3 ? It's a branch for bugfix releases of the icehouse version of the library, so I think stable/icehouse makes sense. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo][messaging] Further improvements and refactoring
Hi, On Tue, 2014-06-10 at 15:47 +0400, Dina Belova wrote: Dims, No problem with creating the specs, we just want to understand if the community is OK with our suggestions in general :) If so, I'll create the appropriate specs and we'll discuss them :) Personally, I find it difficult to understand the proposals as currently described and how they address the performance problems you say you see. The specs process should help flesh out your ideas so they are more understandable. On the other hand, it's pretty difficult to have an abstract conversation about code re-factoring. So, some combination of proof-of-concept patches and specs will probably work best. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder][ceilometer][glance][all] Loading clients from a CONF object
On Wed, 2014-06-11 at 16:57 +1200, Steve Baker wrote: On 11/06/14 15:07, Jamie Lennox wrote: Among the problems cause by the inconsistencies in the clients is that all the options that are required to create a client need to go into the config file of the service. This is a pain to configure from the server side and can result in missing options as servers fail to keep up. With the session object standardizing many of these options there is the intention to make the session be loadable directly from a CONF object. A spec has been proposed to this for nova-specs[1] to outline the problem and the approach in more detail. The TL;DR version is that I intend to collapse all the options to load a client down such that each client will have one ini section that looks vaguely like: [cinder] cafile = '/path/to/cas' certfile = 'path/to/cert' timeout = 5 auth_name = v2password username = 'user' password = 'pass' This list of options is then managed from keystoneclient, thus servers will automatically have access to new transport options, authentication mechanisms and security fixes as they become available. The point of this email is to make people aware of this effort and that if accepted into nova-specs the same pattern will eventually make it to your service (as clients get updated and manpower allows). The review containing the config option names is still open[2] so if you wish to comment on particulars, please take a look. Please leave a comment on the reviews or reply to this email with concerns or questions. Thanks Jamie [1] https://review.openstack.org/#/c/98955/ [2] https://review.openstack.org/#/c/95015/ Heat already needs to have configuration options for every client, and we've gone with the following pattern: http://git.openstack.org/cgit/openstack/heat/tree/etc/heat/heat.conf.sample#n612 Do you have any objection to aligning with what we already have?, specifically: [clients_clientname] ca_file=... cert_file=... key_file=... Sounds like there's a good case for an Oslo API for creating client objects from configuration. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] use of the word certified
On Mon, 2014-06-09 at 20:14 -0400, Doug Hellmann wrote: On Mon, Jun 9, 2014 at 6:11 PM, Eoghan Glynn egl...@redhat.com wrote: Based on the discussion I'd like to propose these options: 1. Cinder-certified driver - This is an attempt to move the certification to the project level. 2. CI-tested driver - This is probably the most accurate, at least for what we're trying to achieve for Juno: Continuous Integration of Vendor-specific Drivers. Hi Ramy, Thanks for these constructive suggestions. The second option is certainly a very direct and specific reflection of what is actually involved in getting the Cinder project's imprimatur. I do like tested. I'd like to understand what the foundation is planning for certification as well, to know how big of an issue this really is. Even if they aren't going to certify drivers, I have heard discussions around training and possibly other areas so I would hate for us to introduce confusion by having different uses of that term in similar contexts. Mark, do you know who is working on that within the board or foundation? http://blogs.gnome.org/markmc/2014/05/17/may-11-openstack-foundation-board-meeting/ Boris Renski raised the possibility of the Foundation attaching the trademark to a verified, certified or tested status for drivers. It wasn't discussed at length because board members hadn't been briefed in advance, but I think it's safe to say there was a knee-jerk negative reaction from a number of members. This is in the context of the DriverLog report: http://stackalytics.com/report/driverlog http://www.mirantis.com/blog/cloud-drivers-openstack-driverlog-part-1-solving-driver-problem/ http://www.mirantis.com/blog/openstack-will-open-source-vendor-certifications/ AIUI the CI tested phrase was chosen in DriverLog to avoid the controversial area Boris describes in the last link above. I think that makes sense. Claiming this CI testing replaces more traditional certification programs is a sure way to bog potentially useful collaboration down in vendor politics. Avoiding dragging the project into those sort of politics is something I'm really keen on, and why I think the word certification is best avoided so we can focus on what we're actually trying to achieve. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [marconi] Reconsidering the unified API model
On Mon, 2014-06-09 at 19:31 +, Kurt Griffiths wrote: Lately we have been talking about writing drivers for traditional message brokers that will not be able to support the message feeds part of the API. I’ve started to think that having a huge part of the API that may or may not “work”, depending on how Marconi is deployed, is not a good story for users, esp. in light of the push to make different clouds more interoperable. Perhaps the first point to get super clear on is why drivers for traditional message brokers are needed. What problems would such drivers address? Who would the drivers help? Would the Marconi team recommend using any of those drivers for a production queuing service? Would the subset of Marconi's API which is implementable by these drivers really be useful for application developers? I'd like to understand that in more detail because I worry the Marconi team is being pushed into adding these drivers without truly believing they will be useful. And if that would not be a sane context to make a serious architectural change. OTOH if there are real, valid use cases for these drivers, then understanding those would inform the architecture decision. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] use of the word certified
On Tue, 2014-06-10 at 14:06 +0100, Duncan Thomas wrote: On 10 June 2014 09:33, Mark McLoughlin mar...@redhat.com wrote: Avoiding dragging the project into those sort of politics is something I'm really keen on, and why I think the word certification is best avoided so we can focus on what we're actually trying to achieve. Avoiding those sorts of politics - 'XXX says it is a certified config, it doesn't work, cinder is junk' - is why I'd rather the cinder core team had a certification program, at least we've some control then and *other* people can't impose their idea of certification on us. I think politics happens, whether you will it or not, so a far more sensible stance is to play it out in advance. Exposing which configurations are actively tested is a perfectly sane thing to do. I don't see why you think calling this certification is necessary to achieve your goals. I don't know what you mean be others imposing their idea of certification. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] use of the word certified
On Tue, 2014-06-10 at 16:09 +0100, Duncan Thomas wrote: On 10 June 2014 15:07, Mark McLoughlin mar...@redhat.com wrote: Exposing which configurations are actively tested is a perfectly sane thing to do. I don't see why you think calling this certification is necessary to achieve your goals. What is certification except a formal way of saying 'we tested it'? At least when you test it enough to have some degree of confidence in your testing. That's *exactly* what certification means. I disagree. I think the word has substantially more connotations than simply this has been tested. http://lists.openstack.org/pipermail/openstack-dev/2014-June/036963.html I don't know what you mean be others imposing their idea of certification. I mean that if some company or vendor starts claiming 'Product X is certified for use with cinder', On what basis would any vendor claim such certification? that is bad for the cinder core team, since we didn't define what got tested or to what degree. That sounds like you mean Storage technology X is certified for use with Vendor Y OpenStack?. i.e. that Vendor Y has certified the driver for use with their version if OpenStack but the Cinder team has no influence over what that means in practice? Whether we like it or not, when something doesn't work in cinder, it is rare for people to blame the storage vendor in their complaints. 'Cinder is broken' is what we hear (and I've heard it, even though what they meant is 'my storage vendor hasn't tested or updated their driver in two releases', that isn't what they /said/). Presumably people are complaining about that driver not working with some specific downstream version of OpenStack, right? Not e.g. stable/icehouse devstack or something? i.e. even aside from the driver, we're already talking about something we as an upstream project don't control the quality of. Since cinder, and therefore cinder-core, is going to get the blame, I feel we should try to maintain some degree of control over the claims. I'm starting to see where you're coming from, but I fear this certification thing will make it even worse. Right now you can easily shrug off any responsibility for the quality of a third party driver or an untested in-tree driver. Sure, some people may have unreasonable expectations about such things, but you can't stop people being idiots. You can better communicate expectations, though, and that's excellent. But as soon as you certify that driver cinder-core takes on a responsibility that I would think is unreasonable even if the driver was tested. But you said it's certified! Is cinder-core really ready to take on responsibility for every issue users see with certified drivers and downstream OpenStack products? If we run our own minimal certification program, which is what we've started doing (started with a script which did a test run and tried to require vendors to run it, that didn't work out well so we're now requiring CI integration instead), then we at least have the option of saying 'You're running an non-certified product, go talk to your vendor' when dealing with the cases we have no control over. Vendors that don't follow the CI cert requirements eventually get their driver removed, that simple. What about issues with a certified driver? Don't talk to the vendor, talk to us instead? If it's an out-of-tree driver then we say talk to your vendor. If it's an in-tree driver, those actively maintaining the driver provide best effort community support like anything else. If it's an in-tree driver and isn't being actively maintained, and best effort community support isn't being provided, then we need a way to communicate that unmaintained status. The level of testing it receives is what we currently see as the most important aspect, but it's not the only aspect. If the user is actually using a distro or other downstream product rather than pure upstream, it's completely normal for upstream to say talk to your distro maintainers or product vendor. Upstream projects can only provide limited support for even motivated and clueful users, particularly when those users are actually using downstream variants of the project. It certainly makes sense to clarify that, but a certification program will actually raise the expectations users have about the level of support upstream will provide. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [marconi] Reconsidering the unified API model
On Tue, 2014-06-10 at 17:33 +, Janczuk, Tomasz wrote: From my perspective the key promise of Marconi is to provide a *multi-tenant*, *HTTP* based queuing system. Think an OpenStack equivalent of SQS or Azure Storage Queues. As far as I know there are no off-the-shelve message brokers out these that fit that bill. Note that when I say ³multi-tenant² I don¹t mean just having multi-tenancy concept reflected in the APIs. The key aspect of the multi-tenancy is security hardening against a variety of attacks absent in single-tenant broker deployments. For example, an authenticated DOS attack. Nicely described. Now why is there a desire to implement these requirements using traditional message brokers? And what Marconi API semantics are impossible to implement using traditional message brokers? Either those semantics are fundamental requirements for this API, or the requirement to have support for traditional message brokers is the fundamental requirement. We can't have it both ways. My suspicion is the API semantics are seen by the Marconi team as the fundamental requirement, and the support for message brokers is very much a secondary concern. If that's the case, perhaps just label those drivers as experimental and not recommended and allow them to return a 501 Not Implemented? Yes, it sucks for portability, but all you're doing is creating space for experimenting with backing Marconi with a message broker ... not actually recommending it for deployment. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-tc] use of the word certified
Hi John, On Fri, 2014-06-06 at 13:59 -0600, John Griffith wrote: On Fri, Jun 6, 2014 at 1:55 PM, John Griffith john.griff...@solidfire.com wrote: On Fri, Jun 6, 2014 at 1:23 PM, Mark McLoughlin mar...@redhat.com wrote: On Fri, 2014-06-06 at 13:29 -0400, Anita Kuno wrote: The issue I have with the word certify is that it requires someone or a group of someones to attest to something. The thing attested to is only as credible as the someone or the group of someones doing the attesting. We have no process, nor do I feel we want to have a process for evaluating the reliability of the somones or groups of someones doing the attesting. I think that having testing in place in line with other programs testing of patches (third party ci) in cinder should be sufficient to address the underlying concern, namely reliability of opensource hooks to proprietary code and/or hardware. I would like the use of the word certificate and all its roots to no longer be used in OpenStack programs with regard to testing. This won't happen until we get some discussion and agreement on this, which I would like to have. Thanks for bringing this up Anita. I agree that certified driver or similar would suggest something other than I think we mean. Can you expand on the above comment? In other words a bit more about what you mean. I think from the perspective of a number of people that participate in Cinder the intent is in fact to say. Maybe it would help clear some things up for folks that don't see why this has become a debatable issue. Fair question. I didn't elaborate initially because I thought Anita covered it pretty well. By running CI tests successfully that it is in fact a way of certifying that our device and driver is in fact 'certified' to function appropriately and provide the same level of API and behavioral compatability as the default components as demonstrated by running CI tests on each submitted patch. My view is that certification is an attestation that someone can take the certified combination of a driver and whatever vendor product it is associated with, and the combination will be fit for purpose in any of the configurations that it supports. To achieve anything close to that, we'd need to be explicit about what distros, deployment tools, OpenStack configurations and vendor configurations must be supported. And it would be fairly strange for us to do that considering the way OpenStack just ships tarballs currently rather than a fully deployable thing. Also AIUI certification implies some level of warranty or guarantee, which goes against the pretty clear language WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND in our license :) Basically, I think there's a world of difference between what's expected of a certification body and what a technical community like ours should IMHO be undertaking in terms of providing information about how functional and maintained drivers are. (To be clear, I love any that we're trying to surface information about how well maintained and tested drivers are) Personally I believe part of the contesting of the phrases and terms is partly due to the fact that a number of organizations have their own certification programs and tests. I think that's great, and they in fact provide some form of certification that a device works in their environment and to their expectations. Also fair, and I should be careful to be clear about my Red Hat bias on this. I am speaking here with my upstream hat on - i.e. thinking about what's good for the project, not necessarily Red Hat - but I'm definitely influenced about the meaning of certification by knowing a little about Red Hat's product certification program. Doing this from a general OpenStack integration perspective doesn't seem all that different to me. For the record, my initial response to this was that I didn't have too much preference on what it was called (verification, certification etc etc), however there seems to be a large number of people (not product vendors for what it's worth) that feel differently. On Fri, Jun 6, 2014 at 1:23 PM, Mark McLoughlin mar...@redhat.com wrote
Re: [openstack-dev] [Glance][TC] Glance Functional API and Cross-project API Consistency
On Fri, 2014-05-30 at 18:22 +, Hemanth Makkapati wrote: Hello All, I'm writing to notify you of the approach the Glance community has decided to take for doing functional API. Also, I'm writing to solicit your feedback on this approach in the light of cross-project API consistency. At the Atlanta Summit, the Glance team has discussed introducing functional API in Glance so as to be able to expose operations/actions that do not naturally fit into the CRUD-style. A few approaches are proposed and discussed here. We have all converged on the approach to include 'action' and action type in the URL. For instance, 'POST /images/{image_id}/actions/{action_type}'. However, this is different from the way Nova does actions. Nova includes action type in the payload. For instance, 'POST /servers/{server_id}/action {type: action_type, ...}'. At this point, we hit a cross-project API consistency issue mentioned here (under the heading 'How to act on resource - cloud perform on resources'). Though we are differing from the way Nova does actions and hence another source of cross-project API inconsistency , we have a few reasons to believe that Glance's way is helpful in certain ways. The reasons are as following: 1. Discoverability of operations. It'll be easier to expose permitted actions through schemas a json home document living at /images/{image_id}/actions/. 2. More conducive for rate-limiting. It'll be easier to rate-limit actions in different ways if the action type is available in the URL. 3. Makes more sense for functional actions that don't require a request body (e.g., image deactivation). At this point we are curious to see if the API conventions group believes this is a valid and reasonable approach. It's obviously preferable if new APIs follow conventions established by existing APIs, but I think you've laid out pretty compelling rationale for not following Nova's lead on this. The question is whether Nova should plan on adopting this approach in a future version of its API? Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Glance] [TC] Program Mission Statement and the Catalog
On Wed, 2014-06-04 at 18:03 -0700, Mark Washenberger wrote: Hi folks, I'd like to propose the Images program to adopt a mission statement [1] and then change it to reflect our new aspirations of acting as a Catalog that works with artifacts beyond just disk images [2]. Since the Glance mini summit early this year, momentum has been building significantly behind catalog effort and I think its time we recognize it officially, to ensure further growth can proceed and to clarify the interactions the Glance Catalog will have with other OpenStack projects. Please see the linked openstack/governance changes, and provide your feedback either in this thread, on the changes themselves, or in the next TC meeting when we get a chance to discuss. Thanks to Georgy Okrokvertskhov for coming up with the new mission statement. Just quoting the proposal here to make the idea slightly more accessible, perhaps triggering some discussion here: https://review.openstack.org/98002 Artifact Repository Service: codename: Glance mission: To provide services to store, browse, share, distribute, and manage artifacts consumable by OpenStack services in a unified manner. An artifact is any strongly-typed, versioned collection of document and bulk, unstructured data and is immutable once the artifact is published in the repository. Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [oslo] Mehdi Abaakouk added to oslo.messaging-core
Mehdi has been making great contributions and reviews on oslo.messaging for months now, so I've added him to oslo.messaging-core. Thank you for all your hard work Mehdi! Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] use of the word certified
On Fri, 2014-06-06 at 13:29 -0400, Anita Kuno wrote: The issue I have with the word certify is that it requires someone or a group of someones to attest to something. The thing attested to is only as credible as the someone or the group of someones doing the attesting. We have no process, nor do I feel we want to have a process for evaluating the reliability of the somones or groups of someones doing the attesting. I think that having testing in place in line with other programs testing of patches (third party ci) in cinder should be sufficient to address the underlying concern, namely reliability of opensource hooks to proprietary code and/or hardware. I would like the use of the word certificate and all its roots to no longer be used in OpenStack programs with regard to testing. This won't happen until we get some discussion and agreement on this, which I would like to have. Thanks for bringing this up Anita. I agree that certified driver or similar would suggest something other than I think we mean. And, for whatever its worth, the topic did come up at a Foundation board meeting and some board members expressed similar concerns, although I guess that was more precisely about the prospect of the Foundation calling drivers certified. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [horizon][infra] Plan for the splitting of Horizon into two repositories
On Thu, 2014-05-29 at 15:29 -0400, Anita Kuno wrote: On 05/28/2014 08:54 AM, Radomir Dopieralski wrote: Hello, we plan to finally do the split in this cycle, and I started some preparations for that. I also started to prepare a detailed plan for the whole operation, as it seems to be a rather big endeavor. You can view and amend the plan at the etherpad at: https://etherpad.openstack.org/p/horizon-split-plan It's still a little vague, but I plan to gradually get it more detailed. All the points are up for discussion, if anybody has any good ideas or suggestions, or can help in any way, please don't hesitate to add to this document. We still don't have any dates or anything -- I suppose we will work that out soonish. Oh, and great thanks to all the people who have helped me so far with it, I wouldn't even dream about trying such a thing without you. Also thanks in advance to anybody who plans to help! I'd like to confirm that we are all aware that this patch creates 16 new repos under the administration of horizon-ptl and horizon-core: https://review.openstack.org/#/c/95716/ If I'm late to the party and the only one that this is news to, that is fine. Sixteen additional repos seems like a lot of additional reviews will be needed. One slightly odd thing about this is that these repos are managed by horizon-core, so presumably part of the Horizon program, but yet the repos are under the stackforge/ namespace. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [horizon][infra] Plan for the splitting of Horizon into two repositories
On Thu, 2014-06-05 at 11:19 +0200, Radomir Dopieralski wrote: On 06/05/2014 10:59 AM, Mark McLoughlin wrote: If I'm late to the party and the only one that this is news to, that is fine. Sixteen additional repos seems like a lot of additional reviews will be needed. One slightly odd thing about this is that these repos are managed by horizon-core, so presumably part of the Horizon program, but yet the repos are under the stackforge/ namespace. What would you propose instead? Keeping them in repositories external to OpenStack, on github or bitbucket sounds wrong. Getting them under openstack/ doesn't sound good either, as the projects they are packaging are not related to OpenStack. Have them be managed by someone else? Who? If they're to be part of the Horizon program, I'd say they should be under openstack/. If not, perhaps create a new team to manage them. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] status of quota class
On Wed, 2014-02-19 at 10:27 -0600, Kevin L. Mitchell wrote: On Wed, 2014-02-19 at 13:47 +0100, Mehdi Abaakouk wrote: But 'quota_class' is never set when a nova RequestContext is created. When I created quota classes, I envisioned the authentication component of the WSGI stack setting the quota_class on the RequestContext, but there was no corresponding concept in Keystone. We need some means of identifying groups of tenants. So my question, what is the plan to finish the 'quota class' feature ? I currently have no plan to work on that, and I am not aware of any such work. Just for reference, we discussed the fact that this code was unused two years ago: https://lists.launchpad.net/openstack/msg12200.html and I see Joe has now completed the process of removing it again: https://review.openstack.org/75535 https://review.openstack.org/91480 https://review.openstack.org/91699 https://review.openstack.org/91700 Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] custom gerrit dashboard - per project review inbox zero
On Fri, 2014-05-09 at 08:20 -0400, Sean Dague wrote: Based on some of my blog posts on gerrit queries, I've built and gotten integrated a custom inbox zero dashboard which is per project in gerrit. ex: https://review.openstack.org/#/projects/openstack/nova,dashboards/important-changes:review-inbox-dashboard (replace openstack/nova with the project of your choice). This provides 3 sections. = Needs Final +2 = This is code that has an existing +2, no negative code review feedback, and positive jenkins score. So it's mergable if you provide the final +2. (Gerrit Query: status:open NOT label:Code-Review=0,self label:Verified=1,jenkins NOT label:Code-Review=-1 label:Code-Review=2 NOT label:Workflow=-1 limit:50 ) = No negative feedback = Changes that have no negative code review feedback, and positive jenkins score. (Gerrit Query: status:open NOT label:Code-Review=0,self label:Verified=1,jenkins NOT label:Code-Review=-1 NOT label:Workflow=-1 limit:50 ) = Wayward changes = Changes that have no code review feedback at all (no one has looked at it), a positive jenkins score, and are older than 2 days. (Gerrit Query: status:open label:Verified=1,jenkins NOT label:Workflow=-1 NOT label:Code-Review=2 age:2d) In all cases it filters out patches that you've commented on in the most recently revision. So as you vote on these things they will disappear from your list. Hopefully people will find this dashboard also useful. Nicely done. Any reason you've included the stable branches - i.e. not restricted it to branch:master ? Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Timestamp formats in the REST API
On Tue, 2014-04-29 at 10:39 -0400, Doug Hellmann wrote: On Tue, Apr 29, 2014 at 9:48 AM, Mark McLoughlin mar...@redhat.com wrote: Hey In this patch: https://review.openstack.org/83681 by Ghanshyam Mann, we encountered an unusual situation where a timestamp in the returned XML looked like this: 2014-04-08 09:00:14.399708+00:00 What appeared to be unusual was that the timestamp had both sub-second time resolution and timezone information. It was felt that this wasn't a valid timestamp format and then some debate about how to 'fix' it: https://review.openstack.org/87563 Anyway, this lead me down a bit of a rabbit hole, so I'm going to attempt to document some findings. Firstly, some definitions: - Python's datetime module talk about datetime objects being 'naive' or 'aware' https://docs.python.org/2.7/library/datetime.html A datetime object d is aware if d.tzinfo is not None and d.tzinfo.utcoffset(d) does not return None. If d.tzinfo is None, or if d.tzinfo is not None but d.tzinfo.utcoffset(d) returns None, d is naive. (Most people will have encountered this already, but I'm including it for completeness) - The ISO8601 time and date format specifies timestamps like this: 2014-04-29T11:37:00Z with many variations. One distinguishing aspect of the ISO8601 format is the 'T' separating date and time. RFC3339 is very closely related and serves as easily accessible documentation of the format: http://www.ietf.org/rfc/rfc3339.txt - The Python iso8601 library allows parsing this time format, but also allows subtle variations that don't conform to the standard like omitting the 'T' separator: import iso8601 iso8601.parse_date('2014-04-29 11:37:00Z') datetime.datetime(2014, 4, 29, 11, 37, tzinfo=iso8601.iso8601.Utc object at 0x214b050) Presumably this is for the pragmatic reason that when you stringify a datetime object, the resulting string uses ' ' as a separator: import datetime str(datetime.datetime(2014, 4, 29, 11, 37)) '2014-04-29 11:37:00' And now some observations on what's going on in Nova: - We don't store timezone information in the database, but all our timestamps are relative to UTC nonetheless. - The objects code automatically adds the UTC to naive datetime objects: if value.utcoffset() is None: value = value.replace(tzinfo=iso8601.iso8601.Utc()) so code that is ported to objects may now be using aware datetime objects where they were previously using naive objects. - Whether we store sub-second resolution timestamps in the database appears to be database specific. In my quick tests, we store that information in sqlite but not MySQL. - However, timestamps added by SQLAlchemy when you do e.g. save() do include sub-second information, so some DB API calls may return sub-second timestamps even when that information isn't stored in the database. In our REST APIs, you'll essentially see one of three time formats. I'm calling them 'isotime', 'strtime' and 'xmltime': - 'isotime' - this is the result from timeutils.isotime(). It includes timezone information (i.e. a 'Z' prefix) but not microseconds. You'll see this in places where we stringify the datetime objects in the API layer using isotime() before passing them to the JSON/XML serializers. - 'strtime' - this is the result from timeutils.strtime(). It doesn't include timezone information but does include decimal seconds. This is what jsonutils.dumps() uses when we're serializing API responses - 'xmltime' or 'str(datetime)' format - this is just what you get when you stringify a datetime using str(). If the datetime is tz aware or includes non-zero microseconds, then that information will be included in the result. This is a significant different versus the other two formats where it is clear whether tz and microsecond information is included in the string. but there are some caveats: - I don't know how significant it is these days, but timestamps will be serialized to strtime format when going over RPC, but won't be de-serialized on the remote end. This could lead to a situation where the API layer tries and stringify a strtime formatted string using timeutils.isotime(). (see below for a description of those formats) - In at least one place - e.g. the 'updated' timestamp for v2 extensions - we hardcode the timestamp as strings in the code and don't currently use one of the formats above. My conclusions from all that: 1) This sucks 2) At the very least, we should be clear in our API samples tests which of the three
[openstack-dev] [nova] Timestamp formats in the REST API
Hey In this patch: https://review.openstack.org/83681 by Ghanshyam Mann, we encountered an unusual situation where a timestamp in the returned XML looked like this: 2014-04-08 09:00:14.399708+00:00 What appeared to be unusual was that the timestamp had both sub-second time resolution and timezone information. It was felt that this wasn't a valid timestamp format and then some debate about how to 'fix' it: https://review.openstack.org/87563 Anyway, this lead me down a bit of a rabbit hole, so I'm going to attempt to document some findings. Firstly, some definitions: - Python's datetime module talk about datetime objects being 'naive' or 'aware' https://docs.python.org/2.7/library/datetime.html A datetime object d is aware if d.tzinfo is not None and d.tzinfo.utcoffset(d) does not return None. If d.tzinfo is None, or if d.tzinfo is not None but d.tzinfo.utcoffset(d) returns None, d is naive. (Most people will have encountered this already, but I'm including it for completeness) - The ISO8601 time and date format specifies timestamps like this: 2014-04-29T11:37:00Z with many variations. One distinguishing aspect of the ISO8601 format is the 'T' separating date and time. RFC3339 is very closely related and serves as easily accessible documentation of the format: http://www.ietf.org/rfc/rfc3339.txt - The Python iso8601 library allows parsing this time format, but also allows subtle variations that don't conform to the standard like omitting the 'T' separator: import iso8601 iso8601.parse_date('2014-04-29 11:37:00Z') datetime.datetime(2014, 4, 29, 11, 37, tzinfo=iso8601.iso8601.Utc object at 0x214b050) Presumably this is for the pragmatic reason that when you stringify a datetime object, the resulting string uses ' ' as a separator: import datetime str(datetime.datetime(2014, 4, 29, 11, 37)) '2014-04-29 11:37:00' And now some observations on what's going on in Nova: - We don't store timezone information in the database, but all our timestamps are relative to UTC nonetheless. - The objects code automatically adds the UTC to naive datetime objects: if value.utcoffset() is None: value = value.replace(tzinfo=iso8601.iso8601.Utc()) so code that is ported to objects may now be using aware datetime objects where they were previously using naive objects. - Whether we store sub-second resolution timestamps in the database appears to be database specific. In my quick tests, we store that information in sqlite but not MySQL. - However, timestamps added by SQLAlchemy when you do e.g. save() do include sub-second information, so some DB API calls may return sub-second timestamps even when that information isn't stored in the database. In our REST APIs, you'll essentially see one of three time formats. I'm calling them 'isotime', 'strtime' and 'xmltime': - 'isotime' - this is the result from timeutils.isotime(). It includes timezone information (i.e. a 'Z' prefix) but not microseconds. You'll see this in places where we stringify the datetime objects in the API layer using isotime() before passing them to the JSON/XML serializers. - 'strtime' - this is the result from timeutils.strtime(). It doesn't include timezone information but does include decimal seconds. This is what jsonutils.dumps() uses when we're serializing API responses - 'xmltime' or 'str(datetime)' format - this is just what you get when you stringify a datetime using str(). If the datetime is tz aware or includes non-zero microseconds, then that information will be included in the result. This is a significant different versus the other two formats where it is clear whether tz and microsecond information is included in the string. but there are some caveats: - I don't know how significant it is these days, but timestamps will be serialized to strtime format when going over RPC, but won't be de-serialized on the remote end. This could lead to a situation where the API layer tries and stringify a strtime formatted string using timeutils.isotime(). (see below for a description of those formats) - In at least one place - e.g. the 'updated' timestamp for v2 extensions - we hardcode the timestamp as strings in the code and don't currently use one of the formats above. My conclusions from all that: 1) This sucks 2) At the very least, we should be clear in our API samples tests which of the three formats we expect - we should only change the format used in a given part of the API after considering any compatibility considerations 3) We should unify on a single format in the v3 API - IMHO, we should be explicit about use of the UTC timezone and we should avoid including
Re: [openstack-dev] [nova] Timestamp formats in the REST API
On Tue, 2014-04-29 at 14:48 +0100, Mark McLoughlin wrote: My conclusions from all that: 1) This sucks 2) At the very least, we should be clear in our API samples tests which of the three formats we expect - we should only change the format used in a given part of the API after considering any compatibility considerations 3) We should unify on a single format in the v3 API - IMHO, we should be explicit about use of the UTC timezone and we should avoid including microseconds unless there's a clear use case. In other words, we should use the 'isotime' format. 4) The 'xmltime' format is just a dumb historical mistake and since XML support is now firmly out of favor, let's not waste time improving the timestamp situation in XML. 5) We should at least consider moving to a single format in the v2 (JSON) API. IMHO, moving from strtime to isotime for fields like created_at and updated_at would be highly unlikely to cause any real issues for API users. (Following up this email with some patches that I'll link to, but I want to link to this email from the patches themselves) See here: https://review.openstack.org/#/q/project:openstack/nova+topic:timestamp-format,n,z Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Oslo] [Ironic] Can we change rpc_thread_pool_size default value?
On Wed, 2014-04-23 at 07:25 +0100, Mark McLoughlin wrote: On Tue, 2014-04-22 at 15:54 -0700, Devananda van der Veen wrote: Hi! When a project is using oslo.messaging, how can we change our default rpc_thread_pool_size? --- Background Ironic has hit a bug where a flood of API requests can deplete the RPC worker pool on the other end and cause things to break in very bad ways. Apparently, nova-conductor hit something similar a while back too. There've been a few long discussions on IRC about it, tracked partially here: https://bugs.launchpad.net/ironic/+bug/1308680 tldr; a way we can fix this is to set the rpc_thread_pool_size very small (eg, 4) and keep our conductor.worker_pool size near its current value (eg, 64). I'd like these to be the default option values, rather than require every user to change the rpc_thread_pool_size in their local ironic.conf file. We're also about to switch from the RPC module in oslo-incubator to using the oslo.messaging library. Why are these related? Because it looks impossible for us to change the default for this option from within Ironic, because the option is registered when EventletExecutor is instantaited (rather than loaded). https://github.com/openstack/oslo.messaging/blob/master/oslo/messaging/_executors/impl_eventlet.py#L76 It may have been possible for Ironic to set its own default before oslo.messaging, but it wouldn't have been recommended because there's no explicit API for doing so. With oslo.messaging, we have a set_transport_defaults() which shows how we'd approach adding this capability. The question comes down to whether this really is a situation where we need per-application defaults or just that the current defaults are screwed up. If the latter, I'd much rather just change the defaults. History is always useful :) Soren added the threadpool with a default size of 1024: https://code.launchpad.net/~soren/nova/rpc-threadpool/+merge/49896 Johannes changed it back to 64: https://review.openstack.org/6792 Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Oslo] [Ironic] Can we change rpc_thread_pool_size default value?
On Tue, 2014-04-22 at 15:54 -0700, Devananda van der Veen wrote: Hi! When a project is using oslo.messaging, how can we change our default rpc_thread_pool_size? --- Background Ironic has hit a bug where a flood of API requests can deplete the RPC worker pool on the other end and cause things to break in very bad ways. Apparently, nova-conductor hit something similar a while back too. There've been a few long discussions on IRC about it, tracked partially here: https://bugs.launchpad.net/ironic/+bug/1308680 tldr; a way we can fix this is to set the rpc_thread_pool_size very small (eg, 4) and keep our conductor.worker_pool size near its current value (eg, 64). I'd like these to be the default option values, rather than require every user to change the rpc_thread_pool_size in their local ironic.conf file. We're also about to switch from the RPC module in oslo-incubator to using the oslo.messaging library. Why are these related? Because it looks impossible for us to change the default for this option from within Ironic, because the option is registered when EventletExecutor is instantaited (rather than loaded). https://github.com/openstack/oslo.messaging/blob/master/oslo/messaging/_executors/impl_eventlet.py#L76 It may have been possible for Ironic to set its own default before oslo.messaging, but it wouldn't have been recommended because there's no explicit API for doing so. With oslo.messaging, we have a set_transport_defaults() which shows how we'd approach adding this capability. The question comes down to whether this really is a situation where we need per-application defaults or just that the current defaults are screwed up. If the latter, I'd much rather just change the defaults. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] nominating Victor Stinner for the Oslo core reviewers team
On Mon, 2014-04-21 at 12:39 -0400, Doug Hellmann wrote: I propose that we add Victor Stinner (haypo on freenode) to the Oslo core reviewers team. Victor is a Python core contributor, and works on the development team at eNovance. He created trollius, a port of Python 3's tulip/asyncio module to Python 2, at least in part to enable a driver for oslo.messaging. He has been quite active with Python 3 porting work in Oslo and some other projects, and organized a sprint to work on the port at PyCon last week. The patches he has written for the python 3 work have all covered backwards-compatibility so that the code continues to work as before under python 2. Given his background, skills, and interest, I think he would be a good addition to the team. Sounds good to me! Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] starting regular meetings
On Mon, 2014-04-14 at 14:53 -0400, Doug Hellmann wrote: Balancing Europe and Pacific TZs is going to be a challenge. I can't go at 1800 or 1900, myself, and those are pushing a little late in Europe anyway. How about 1600? http://www.timeanddate.com/worldclock/converted.html?iso=20140414T16p1=0p2=2133p3=195p4=224 We would need to move to another room, but that's not a big deal. Works for me. Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OSSG][OSSN] OpenSSL Heartbleed vulnerability can lead to OpenStack compromise
On Thu, 2014-04-10 at 00:23 -0700, Nathan Kinder wrote: OpenSSL Heartbleed vulnerability can lead to OpenStack compromise --- ### Summary ### A vulnerability in OpenSSL can lead to leaking of confidential data protected by SSL/TLS in an OpenStack deployment. ### Affected Services / Software ### Grizzly, Havana, OpenSSL ### Discussion ### A vulnerability in OpenSSL code-named Heartbleed was recently discovered that allows remote attackers limited access to data in the memory of any service using OpenSSL to provide encryption for network communications. This can include key material used for SSL/TLS, which means that any confidential data that has been sent over SSL/TLS may be compromised. For full details, see the following website that describes this vulnerability in detail: http://heartbleed.com/ While OpenStack software itself is not directly affected, any deployment of OpenStack is very likely using OpenSSL to provide SSL/TLS functionality. ### Recommended Actions ### It is recommended that you immediately update OpenSSL software on the systems you use to run OpenStack services. Not sure if you want to mention it in this OSSN or consider doing it too, but clients are vulnerable to attack too. In most cases, you will want to upgrade to OpenSSL version 1.0.1g, though it is recommended that you review the exact affected version details on the Heartbleed website referenced above. After upgrading your OpenSSL software, you will need to restart any services that use the OpenSSL libraries. You can get a list of all processes that have the old version of OpenSSL loaded by running the following command: lsof | grep ssl | grep DEL Any processes shown by the above command will need to be restarted, or you can choose to restart your entire system if desired. In an OpenStack deployment, OpenSSL is commonly used to enable SSL/TLS protection for OpenStack API endpoints, SSL terminators, databases, message brokers, and Libvirt remote access. In addition to the native OpenStack services, some commonly used software that may need to be restarted includes: Apache HTTPD Libvirt MySQL Nginx PostgreSQL Pound Qpid RabbitMQ Stud It is also recommended that you treat your existing SSL/TLS keys as compromised and generate new keys. This includes keys used to enable SSL/TLS protection for OpenStack API endpoints, databases, message brokers, and libvirt remote access. Might be worth mentioning certificate revocation too. In addition, any confidential data such as credentials that have been sent over a SSL/TLS connection may have been compromised. It is recommended that cloud administrators change any passwords, tokens, or other credentials that may have been communicated over SSL/TLS. ### Contacts / References ### This OSSN : https://wiki.openstack.org/wiki/OSSN/OSSN-0012 OpenStack Security ML : openstack-secur...@lists.openstack.org OpenStack Security Group : https://launchpad.net/~openstack-ossg Heartbleed Website: http://heartbleed.com/ CVE: CVE-2014-0160 Very nicely done Nathan. Not really relevant to the OSSN, but perhaps people will find it interesting, I posted some thoughts on the wider fallout of heartbleed this morning: http://blogs.gnome.org/markmc/2014/04/10/heartbleed/ Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [olso][neutron] proxying oslo.messaging from management network into tenant network/VMs
Hi, On Wed, 2014-04-09 at 17:33 +0900, Isaku Yamahata wrote: Hello developers. As discussed many times so far[1], there are many projects that needs to propagate RPC messages into VMs running on OpenStack. Neutron in my case. My idea is to relay RPC messages from management network into tenant network over file-like object. By file-like object, I mean virtio-serial, unix domain socket, unix pipe and so on. I've wrote some code based on oslo.messaging[2][3] and a documentation on use cases.[4][5] Only file-like transport and proxying messages would be in oslo.messaging and agent side code wouldn't be a part of oslo.messaging. use cases:([5] for more figures) file-like object: virtio-serial, unix domain socket, unix pipe server - AMQP - agent in host -virtio serial- guest agent in VM per VM server - AMQP - agent in host -unix socket/pipe- agent in tenant network - guest agent in VM So far there are security concerns to forward oslo.messaging from management network into tenant network. One approach is to allow only cast-RPC from server to guest agent in VM so that guest agent in VM only receives messages and can't send anything to servers. With unix pipe, it's write-only for server, read-only for guest agent. Thoughts? comments? Nice work. This is a pretty gnarly topic, but I think you're doing a good job thinking through a good solution here. The advantage this has over Marconi is that it avoids relying on something which might not be commonplace in OpenStack deployments for a number of releases yet. Using vmchannel/virtio-serial to talk to an oslo.messaging proxy server (with would have a configurable security policy) over a unix socket oslo.messaging transport in order to allow limited bridging from the tenant network to management network ... definitely sounds like a reasonable proposal. Looking forward to your session at the summit! I also hope to look at your patches before then. Thanks, Mark. Details of Neutron NFV use case[6]: Neutron services so far typically runs agents in host, the host agent in host receives RPCs from neutron server, then it executes necessary operations. Sometimes the agent in host issues RPC to neutron server periodically.(e.g. status report etc) It's desirable to make such services virtualized as Network Function Virtualizaton(NFV), i.e. make those features run in VMs. So it's quite natural approach to propagate those RPC message into agents into VMs. [1] https://wiki.openstack.org/wiki/UnifiedGuestAgent [2] https://review.openstack.org/#/c/77862/ [3] https://review.openstack.org/#/c/77863/ [4] https://blueprints.launchpad.net/oslo.messaging/+spec/message-proxy-server [5] https://wiki.openstack.org/wiki/Oslo/blueprints/message-proxy-server [6] https://blueprints.launchpad.net/neutron/+spec/adv-services-in-vms ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] use of the oslo namespace package
On Mon, 2014-04-07 at 15:24 -0400, Doug Hellmann wrote: We can avoid adding to the problem by putting each new library in its own package. We still want the Oslo name attached for libraries that are really only meant to be used by OpenStack projects, and so we need a naming convention. I'm not entirely happy with the crammed together approach for oslotest and oslosphinx. At one point Dims and I talked about using a prefix oslo_ instead of just oslo, so we would have oslo_db, oslo_i18n, etc. That's also a bit ugly, though. Opinions? Uggh :) Given the number of problems we have now (I help about 1 dev per week unbreak their system), I've seen you do this - kudos on your patience. I think we should also consider renaming the existing libraries to not use the namespace package. That isn't a trivial change, since it will mean updating every consumer as well as the packaging done by distros. If we do decide to move them, I will need someone to help put together a migration plan. Does anyone want to volunteer to work on that? One thing to note for any migration plan on this - we should use a new pip package name for the new version so people with e.g. oslo.config=1.2.0 don't automatically get updated to a version which has the code in a different place. You should need to change to e.g. osloconfig=1.4.0 Before we make any changes, it would be good to know how bad this problem still is. Do developers still see issues on clean systems, or are all of the problems related to updating devstack boxes? Are people figuring out how to fix or work around the situation on their own? Can we make devstack more aggressive about deleting oslo libraries before re-installing them? Are there other changes we can make that would be less invasive? I don't have any great insight, but hope we can figure something out. It's crazy to think that even though namespace packages appear to work pretty well initially, it might end up being so unworkable we would need to switch. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] oslo.messaging 1.3.0 released
Hi oslo.messaging 1.3.0 is now available on pypi and should be available in our mirror shortly. Full release notes are available here: http://docs.openstack.org/developer/oslo.messaging/ The master branch will soon be open for Juno targeted development and we'll publish 1.4.0aN beta releases from master before releasing 1.4.0 for the Juno release. A stable/icehouse branch will be created for important bugfixes that will be released as 1.3.N. Thanks, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [depfreeze] [horizon] Exception request: python-keystoneclient=0.7.0
On Thu, 2014-03-27 at 13:53 +, Julie Pichon wrote: Hi, I would like to request a depfreeze exception to bump up the keystone client requirement [1], in order to reenable the ability for users to update their own password with Keystone v3 in Horizon in time for Icehouse [2]. This capability is requested by end-users quite often but had to be deactivated at the end of Havana due to some issues that are now resolved, thanks to the latest keystone client release. Since this is a library we control, hopefully this shouldn't cause too much trouble for packagers. Thank you for your consideration. Julie [1] https://review.openstack.org/#/c/83287/ [2] https://review.openstack.org/#/c/59918/ IMHO, it's hard to imagine that Icehouse requiring a more recent version of keystoneclient being a problem or risk for anyone. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Multiple patches in one review
On Mon, 2014-03-24 at 10:49 -0400, Russell Bryant wrote: Gerrit support for a patch series could certainly be better. There has long been talking about gerrit getting topic review functionality, whereby you could e.g. approve a whole series of patches from a topic view. See: https://code.google.com/p/gerrit/issues/detail?id=51 https://groups.google.com/d/msg/repo-discuss/5oRra_tLKMA/rxwU7pPAQE8J My understanding is there's a fork of gerrit out there with this functionality that some projects are using successfully. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] OpenStack vs. SQLA 0.9
FYI, allowing 0.9 recently merged into openstack/requirements: https://review.openstack.org/79817 This is a good example of how we should be linking gerrit and mailing list discussions together more. I don't think the gerrit review was linked in this thread nor was the mailing list discussion linked in the gerrit review. Mark. On Thu, 2014-03-13 at 22:45 -0700, Roman Podoliaka wrote: Hi all, I think it's actually not that hard to fix the errors we have when using SQLAlchemy 0.9.x releases. I uploaded two changes two Nova to fix unit tests: - https://review.openstack.org/#/c/80431/ (this one should also fix the Tempest test run error) - https://review.openstack.org/#/c/80432/ Thanks, Roman On Thu, Mar 13, 2014 at 7:41 PM, Thomas Goirand z...@debian.org wrote: On 03/14/2014 02:06 AM, Sean Dague wrote: On 03/13/2014 12:31 PM, Thomas Goirand wrote: On 03/12/2014 07:07 PM, Sean Dague wrote: Because of where we are in the freeze, I think this should wait until Juno opens to fix. Icehouse will only be compatible with SQLA 0.8, which I think is fine. I expect the rest of the issues can be addressed during Juno 1. -Sean Sean, No, it's not fine for me. I'd like things to be fixed so we can move forward. Debian Sid has SQLA 0.9, and Jessie (the next Debian stable) will be released SQLA 0.9 and with Icehouse, not Juno. We're past freeze, and this requires deep changes in Nova DB to work. So it's not going to happen. Nova provably does not work with SQLA 0.9, as seen in Tempest tests. -Sean I'd be nice if we considered more the fact that OpenStack, at some point, gets deployed on top of distributions... :/ Anyway, if we can't do it because of the freeze, then I will have to carry the patch in the Debian package. Never the less, someone will have to work and fix it. If you know how to help, it'd be very nice if you proposed a patch, even if we don't accept it before Juno opens. Thomas Goirand (zigo) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
On Thu, 2014-03-20 at 01:28 +, Joshua Harlow wrote: Proxying from yahoo's open source director (since he wasn't initially subscribed to this list, afaik he now is) on his behalf. From Gil Yehuda (Yahoo’s Open Source director). I would urge you to avoid creating a dependency between Openstack code and any AGPL project, including MongoDB. MongoDB is licensed in a very strange manner that is prone to creating unintended licensing mistakes (a lawyer’s dream). Indeed, MongoDB itself presents Apache licensed drivers – and thus technically, users of those drivers are not impacted by the AGPL terms. MongoDB Inc. is in the unique position to license their drivers this way (although they appear to violate the AGPL license) since MongoDB is not going to sue themselves for their own violation. However, others in the community create MongoDB drivers are licensing those drivers under the Apache and MIT licenses – which does pose a problem. Why? The AGPL considers 'Corresponding Source' to be defined as “the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. Database drivers *are* work that is designed to require by intimate data communication or control flow between those subprograms and other parts of the work. So anyone using MongoDB with any other driver now invites an unknown -- that one court case, one judge, can read the license under its plain meaning and decide that AGPL terms apply as stated. We have no way to know how far they apply since this license has not been tested in court yet. Despite all the FAQs MongoDB puts on their site indicating they don't really mean to assert the license terms, normally when you provide a license, you mean those terms. If they did not mean those terms, they would not use this license. I hope they intended to do something good (to get contributions back without impacting applications using their database) but, even good intentions have unintended consequences. Companies with deep enough pockets to be lawsuit targets, and companies who want to be good open source citizens face the problem that using MongoDB anywhere invites the future risk of legal catastrophe. A simple development change in an open source project can change the economics drastically. This is simply unsafe and unwise. OpenStack's ecosystem is fueled by the interests of many commercial ventures who wish to cooperate in the open source manner, but then leverage commercial opportunities they hope to create. I suggest that using MongoDB anywhere in this project will result in a loss of opportunity -- real or perceived, that would outweigh the benefits MongoDB itself provides. tl;dr version: If you want to use MongoDB in your company, that's your call. Please don't turn anyone who uses OpenStack components into a unsuspecting MongoDB users. Instead, decouple the database from the project. It's not worth the legal risk, nor the impact on the Apache-ness of this project. Thanks for that, Josh and Gil. Rather than cross-posting, I think this MongoDB/AGPLv3 discussion should continue on the legal-discuss mailing list: http://lists.openstack.org/pipermail/legal-discuss/2014-March/thread.html#174 Bear in mind that we (OpenStack, as a project and community) need to judge whether this is a credible concern or not. If some users said they were only willing to deploy Apache licensed code in their organization, we would dismiss that notion pretty quickly. Is this AGPLv3 concern sufficiently credible that OpenStack needs to take it into account when making important decisions? That's what I'm hoping to get to in the legal-discuss thread. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev