Re: [openstack-dev] [tc][ceilometer] Some background on the gnocchi project
On 8/11/2014 4:22 PM, Eoghan Glynn wrote: Hi Eoghan, Thanks for the note below. However, one thing the overview below does not cover is why InfluxDB ( http://influxdb.com/ ) is not being leveraged. Many folks feel that this technology is a viable solution for the problem space discussed below. Great question Brad! As it happens we've been working closely with Paul Dix (lead developer of InfluxDB) to ensure that this metrics store would be usable as a backend driver. That conversation actually kicked off at the Juno summit in Atlanta, but it really got off the ground at our mid-cycle meet-up in Paris on in early July. ... The InfluxDB folks have committed to implementing those features in over July and August, and have made concrete progress on that score. I hope that provides enough detail to answer to your question? I guess it begs the question, if influxdb will do what you want and it's open source (MIT) as well as commercially supported, how does gnocchi differentiate? Cheers, Eoghan Thanks, Brad Brad Topol, Ph.D. IBM Distinguished Engineer OpenStack (919) 543-0646 Internet: bto...@us.ibm.com Assistant: Kendra Witherspoon (919) 254-0680 From: Eoghan Glynn egl...@redhat.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 08/06/2014 11:17 AM Subject: [openstack-dev] [tc][ceilometer] Some background on the gnocchi project Folks, It's come to our attention that some key individuals are not fully up-to-date on gnocchi activities, so it being a good and healthy thing to ensure we're as communicative as possible about our roadmap, I've provided a high-level overview here of our thinking. This is intended as a precursor to further discussion with the TC. Cheers, Eoghan What gnocchi is: === Gnocchi is a separate, but related, project spun up on stackforge by Julien Danjou, with the objective of providing efficient storage and retrieval of timeseries-oriented data and resource representations. The goal is to experiment with a potential approach to addressing an architectural misstep made in the very earliest days of ceilometer, specifically the decision to store snapshots of some resource metadata alongside each metric datapoint. The core idea is to move to storing datapoints shorn of metadata, and instead allow the resource-state timeline to be reconstructed more cheaply from much less frequently occurring events (e.g. instance resizes or migrations). What gnocchi isn't: == Gnocchi is not a large-scale under-the-radar rewrite of a core OpenStack component along the lines of keystone-lite. The change is concentrated on the final data-storage phase of the ceilometer pipeline, so will have little initial impact on the data-acquiring agents, or on transformation phase. We've been totally open at the Atlanta summit and other forums about this approach being a multi-cycle effort. Why we decided to do it this way: The intent behind spinning up a separate project on stackforge was to allow the work progress at arms-length from ceilometer, allowing normalcy to be maintained on the core project and a rapid rate of innovation on gnocchi. Note that that the developers primarily contributing to gnocchi represent a cross-section of the core team, and there's a regular feedback loop in the form of a recurring agenda item at the weekly team meeting to avoid the effort becoming silo'd. But isn't re-architecting frowned upon? == Well, the architecture of other OpenStack projects have also under-gone change as the community understanding of the implications of prior design decisions has evolved. Take for example the move towards nova no-db-compute the unified-object-model in order to address issues in the nova architecture that made progress towards rolling upgrades unneccessarily difficult. The point, in my understanding, is not to avoid doing the course-correction where it's deemed necessary. Rather, the principle is more that these corrections happen in an open and planned way. The path forward: A subset of the ceilometer community will continue to work on gnocchi in parallel with the ceilometer core over the remainder of the Juno cycle and into the Kilo timeframe. The goal is to have an initial implementation of gnocchi ready for tech preview by the end of Juno, and to have the integration/migration/ co-existence questions addressed in Kilo. Moving the ceilometer core to using gnocchi will be contingent on it demonstrating the required performance characteristics and providing the semantics needed to support a v3 ceilometer API that's fit-for-purpose. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [tc][ceilometer] Some background on the gnocchi project
On 8/11/2014 5:29 PM, Eoghan Glynn wrote: On 8/11/2014 4:22 PM, Eoghan Glynn wrote: Hi Eoghan, Thanks for the note below. However, one thing the overview below does not cover is why InfluxDB ( http://influxdb.com/ ) is not being leveraged. Many folks feel that this technology is a viable solution for the problem space discussed below. Great question Brad! As it happens we've been working closely with Paul Dix (lead developer of InfluxDB) to ensure that this metrics store would be usable as a backend driver. That conversation actually kicked off at the Juno summit in Atlanta, but it really got off the ground at our mid-cycle meet-up in Paris on in early July. ... The InfluxDB folks have committed to implementing those features in over July and August, and have made concrete progress on that score. I hope that provides enough detail to answer to your question? I guess it begs the question, if influxdb will do what you want and it's open source (MIT) as well as commercially supported, how does gnocchi differentiate? Hi Sandy, One of the ideas behind gnocchi is to combine resource representation and timeseries-oriented storage of metric data, providing an efficient and convenient way to query for metric data associated with individual resources. Doesn't InfluxDB do the same? Also, having an API layered above the storage driver avoids locking in directly with a particular metrics-oriented DB, allowing for the potential to support multiple storage driver options (e.g. to choose between a canonical implementation based on Swift, an InfluxDB driver, and an OpenTSDB driver, say). Right, I'm not suggesting to remove the storage abstraction layer. I'm just curious what gnocchi does better/different than InfluxDB? Or, am I missing the objective here and gnocchi is the abstraction layer and not an influxdb alternative? If so, my apologies for the confusion. A less compelling reason would be to provide a well-defined hook point to innovate with aggregation/analytic logic not supported natively in the underlying drivers (e.g. period-spanning statistics such as exponentially-weighted moving average or even Holt-Winters). Cheers, Eoghan Cheers, Eoghan Thanks, Brad Brad Topol, Ph.D. IBM Distinguished Engineer OpenStack (919) 543-0646 Internet: bto...@us.ibm.com Assistant: Kendra Witherspoon (919) 254-0680 From: Eoghan Glynn egl...@redhat.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 08/06/2014 11:17 AM Subject: [openstack-dev] [tc][ceilometer] Some background on the gnocchi project Folks, It's come to our attention that some key individuals are not fully up-to-date on gnocchi activities, so it being a good and healthy thing to ensure we're as communicative as possible about our roadmap, I've provided a high-level overview here of our thinking. This is intended as a precursor to further discussion with the TC. Cheers, Eoghan What gnocchi is: === Gnocchi is a separate, but related, project spun up on stackforge by Julien Danjou, with the objective of providing efficient storage and retrieval of timeseries-oriented data and resource representations. The goal is to experiment with a potential approach to addressing an architectural misstep made in the very earliest days of ceilometer, specifically the decision to store snapshots of some resource metadata alongside each metric datapoint. The core idea is to move to storing datapoints shorn of metadata, and instead allow the resource-state timeline to be reconstructed more cheaply from much less frequently occurring events (e.g. instance resizes or migrations). What gnocchi isn't: == Gnocchi is not a large-scale under-the-radar rewrite of a core OpenStack component along the lines of keystone-lite. The change is concentrated on the final data-storage phase of the ceilometer pipeline, so will have little initial impact on the data-acquiring agents, or on transformation phase. We've been totally open at the Atlanta summit and other forums about this approach being a multi-cycle effort. Why we decided to do it this way: The intent behind spinning up a separate project on stackforge was to allow the work progress at arms-length from ceilometer, allowing normalcy to be maintained on the core project and a rapid rate of innovation on gnocchi. Note that that the developers primarily contributing to gnocchi represent a cross-section of the core team, and there's a regular feedback loop in the form of a recurring agenda item at the weekly team meeting to avoid the effort becoming silo'd. But isn't re-architecting frowned upon? == Well, the architecture of other OpenStack projects have also under-gone change as the community understanding of the implications of prior design decisions has
Re: [openstack-dev] [tc][ceilometer] Some background on the gnocchi project
On 8/11/2014 6:49 PM, Eoghan Glynn wrote: On 8/11/2014 4:22 PM, Eoghan Glynn wrote: Hi Eoghan, Thanks for the note below. However, one thing the overview below does not cover is why InfluxDB ( http://influxdb.com/ ) is not being leveraged. Many folks feel that this technology is a viable solution for the problem space discussed below. Great question Brad! As it happens we've been working closely with Paul Dix (lead developer of InfluxDB) to ensure that this metrics store would be usable as a backend driver. That conversation actually kicked off at the Juno summit in Atlanta, but it really got off the ground at our mid-cycle meet-up in Paris on in early July. ... The InfluxDB folks have committed to implementing those features in over July and August, and have made concrete progress on that score. I hope that provides enough detail to answer to your question? I guess it begs the question, if influxdb will do what you want and it's open source (MIT) as well as commercially supported, how does gnocchi differentiate? Hi Sandy, One of the ideas behind gnocchi is to combine resource representation and timeseries-oriented storage of metric data, providing an efficient and convenient way to query for metric data associated with individual resources. Doesn't InfluxDB do the same? InfluxDB stores timeseries data primarily. Gnocchi in intended to store strongly-typed OpenStack resource representations (instance, images, etc.) in addition to providing a means to access timeseries data associated with those resources. So to answer your question: no, IIUC, it doesn't do the same thing. Ok, I think I'm getting closer on this. Thanks for the clarification. Sadly, I have more questions :) Is this closer? a metadata repo for resources (instances, images, etc) + an abstraction to some TSDB(s)? Hmm, thinking out loud ... if it's a metadata repo for resources, who is the authoritative source for what the resource is? Ceilometer/Gnocchi or the source service? For example, if I want to query instance power state do I ask ceilometer or Nova? Or is it metadata about the time-series data collected for that resource? In which case, I think most tsdb's have some sort of series description facilities. I guess my question is, what makes this metadata unique and how would it differ from the metadata ceilometer already collects? Will it be using Glance, now that Glance is becoming a pure metadata repo? Though of course these things are not a million miles from each other, one is just a step up in the abstraction stack, having a wider and more OpenStack-specific scope. Could it be a generic timeseries service? Is it openstack specific because it uses stackforge/python/oslo? I assume the rules and schemas will be data-driven (vs. hard-coded)? ... and since the ceilometer collectors already do the bridge work, is it a pre-packaging of definitions that target openstack specifically? (not sure about wider and more specific) Sorry if this was already hashed out in Atlanta. Also, having an API layered above the storage driver avoids locking in directly with a particular metrics-oriented DB, allowing for the potential to support multiple storage driver options (e.g. to choose between a canonical implementation based on Swift, an InfluxDB driver, and an OpenTSDB driver, say). Right, I'm not suggesting to remove the storage abstraction layer. I'm just curious what gnocchi does better/different than InfluxDB? Or, am I missing the objective here and gnocchi is the abstraction layer and not an influxdb alternative? If so, my apologies for the confusion. No worries :) The intention is for gnocchi to provide an abstraction over timeseries, aggregation, downsampling and archiving/retention policies, with a number of drivers mapping onto real timeseries storage options. One of those drivers is based on Swift, another is in the works based on InfluxDB, and a third based on OpenTSDB has also been proposed. Cheers, Eoghan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Annoucing CloudKitty : an OpenSource Rating-as-a-Service project for OpenStack
Sounds very interesting. We're currently collecting detailed (and verified) usage information in StackTach and are keen to see what CloudKitty is able to offer. My one wish is that you keep the components as small pip redistributables with low coupling to promote reuse with other projects. Many tiny repos and clear API's (internal and external) are good for adoption and contribution. All the best! -Sandy From: Christophe Sauthier [christophe.sauth...@objectif-libre.com] Sent: Wednesday, August 13, 2014 10:40 AM To: openstack-dev@lists.openstack.org Subject: [openstack-dev] Annoucing CloudKitty : an OpenSource Rating-as-a-Service project for OpenStack We are very pleased at Objectif Libre to intoduce CloudKitty, an effort to provide a fully OpenSource Rating-as-a-Service component in OpenStack.. Following a first POC presented during the last summit in Atlanta to some Ceilometer devs (thanks again Julien Danjou for your great support !), we continued our effort to create a real service for rating. Today we are happy to share it with you all. So what do we propose in CloudKitty? - a service for collecting metrics (using Ceilometer API) - a modular rating architecture to enable/disable modules and create your own rules on-the-fly, allowing you to use the rating patterns you like - an API to interact with the whole environment from core components to every rating module - a Horizon integration to allow configuration of the rating modules and display of pricing information in real time during instance creation - a CLI client to access this information and easily configure everything Technically we are using all the elements that are used in the various OpenStack projects like olso, stevedore, pecan... CloudKitty is highly modular and allows integration / developement of third party collection and rating modules and output formats. A roadmap is available on the project wiki page (the link is at the end of this email), but we are clearly hoping to have some feedback and ideas on how to improve the project and reach a tighter integration with OpenStack. The project source code is available at http://github.com/stackforge/cloudkitty More stuff will be available on stackforge as soon as the reviews get validated like python-cloudkittyclient and cloudkitty-dashboard, so stay tuned. The project's wiki page (https://wiki.openstack.org/wiki/CloudKitty) provides more information, and you can reach us via irc on freenode: #cloudkitty. Developper's documentation is on its way to readthedocs too. We plan to present CloudKitty in detail during the Paris Summit, but we would love to hear from you sooner... Cheers, Christophe and Objectif Libre Christophe Sauthier Mail : christophe.sauth...@objectif-libre.com CEO Fondateur Mob : +33 (0) 6 16 98 63 96 Objectif LibreURL : www.objectif-libre.com Infrastructure et Formations LinuxTwitter : @objectiflibre ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][core] Expectations of core reviewers
On 8/14/2014 11:28 AM, Russell Bryant wrote: On 08/14/2014 10:04 AM, CARVER, PAUL wrote: Daniel P. Berrange [mailto:berra...@redhat.com] wrote: Depending on the usage needs, I think Google hangouts is a quite useful technology. For many-to-many session its limit of 10 participants can be an issue, but for a few-to-many broadcast it could be practical. What I find particularly appealing is the way it can live stream the session over youtube which allows for unlimited number of viewers, as well as being available offline for later catchup. I can't actually offer ATT resources without getting some level of management approval first, but just for the sake of discussion here's some info about the telepresence system we use. -=-=-=-=-=-=-=-=-=- ATS B2B Telepresence conferences can be conducted with an external company's Telepresence room(s), which subscribe to the ATT Telepresence Solution, or a limited number of other Telepresence service provider's networks. Currently, the number of Telepresence rooms that can participate in a B2B conference is limited to a combined total of 20 rooms (19 of which can be ATT rooms, depending on the number of remote endpoints included). -=-=-=-=-=-=-=-=-=- We currently have B2B interconnect with over 100 companies and ATT has telepresence rooms in many of our locations around the US and around the world. If other large OpenStack companies also have telepresence rooms that we could interconnect with I think it might be possible to get management agreement to hold a couple OpenStack meetups per year. Most of our rooms are best suited for 6 people, but I know of at least one 18 person telepresence room near me. An ideal solution would allow attendees to join as individuals from anywhere. A lot of contributors work from home. Is that sort of thing compatible with your system? http://bluejeans.com/ was a good experience. What about Google Hangout OnAir for the PTL and core, while others are view-only with chat/irc questions? http://www.google.com/+/learnmore/hangouts/onair.html ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][core] Expectations of core reviewers
Maybe we need to think about this from a distributed software perspective? * Divide and Conquer? Can we split the topics to create more manageable sub-groups? This way it's not core-vs-non-core but intererested-vs-moderately-interested. (of course, this is much the way the mailing list works). Perhaps OnAir would work well for that? How about geographic separation? Meetings per time-zone that roll up into larger meetings (see More Workers below). This is much the same way the regional openstack meetups work, but with specific topics. Of course, then we get replication latency :) * More workers? Can we assign topic owners? Cores might delegate a topic to a non-core member to gather consensus, concerns, suggestions and summarize the result to present during weekly IRC meetings. * Better threading? Are there other tools than mailing lists for talking about these topics? Would mind-mapping software [1] work better for keeping the threads manageable? -Sandy [1] http://en.wikipedia.org/wiki/Mind_map ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On 8/14/2014 6:42 PM, Doug Hellmann wrote: On Aug 14, 2014, at 4:41 PM, Joe Gordon joe.gord...@gmail.commailto:joe.gord...@gmail.com wrote: On Wed, Aug 13, 2014 at 12:24 PM, Doug Hellmann d...@doughellmann.commailto:d...@doughellmann.com wrote: On Aug 13, 2014, at 3:05 PM, Eoghan Glynn egl...@redhat.commailto:egl...@redhat.com wrote: At the end of the day, that's probably going to mean saying No to more things. Everytime I turn around everyone wants the TC to say No to things, just not to their particular thing. :) Which is human nature. But I think if we don't start saying No to more things we're going to end up with a pile of mud that no one is happy with. That we're being so abstract about all of this is frustrating. I get that no-one wants to start a flamewar, but can someone be concrete about what they feel we should say 'no' to but are likely to say 'yes' to? I'll bite, but please note this is a strawman. No: * Accepting any more projects into incubation until we are comfortable with the state of things again * Marconi * Ceilometer Well -1 to that, obviously, from me. Ceilometer is on track to fully execute on the gap analysis coverage plan agreed with the TC at the outset of this cycle, and has an active plan in progress to address architectural debt. Yes, there seems to be an attitude among several people in the community that the Ceilometer team denies that there are issues and refuses to work on them. Neither of those things is the case from our perspective. Totally agree. Can you be more specific about the shortcomings you see in the project that aren’t being addressed? Once again, this is just a straw man. You’re not the first person to propose ceilometer as a project to kick out of the release, though, and so I would like to be talking about specific reasons rather than vague frustrations. I'm just not sure OpenStack has 'blessed' the best solution out there. https://wiki.openstack.org/wiki/Ceilometer/Graduation#Why_we_think_we.27re_ready * Successfully passed the challenge of being adopted by 3 related projects which have agreed to join or use ceilometer: * Synaps * Healthnmon * StackTachhttps://wiki.openstack.org/w/index.php?title=StackTachaction=editredlink=1 Stacktach seems to still be under active development (http://git.openstack.org/cgit/stackforge/stacktach/log/), is used by rackspace in production and from everything I hear is more mature then ceilometer. Stacktach is older than ceilometer, but does not do all of the things ceilometer does now and aims to do in the future. It has been a while since I last looked at it, so the situation may have changed, but some of the reasons stacktach would not be a full replacement for ceilometer include: it only works with AMQP; it collects notification events, but doesn’t offer any metering ability per se (no tracking of values like CPU or bandwidth utilization); it only collects notifications from some projects, and doesn’t have a way to collect data from swift, which doesn’t emit notifications; and it does not integrate with Heat to trigger autoscaling alarms. Well, that's my cue. Yes, StackTach was started before the incubation process was established and it solves other problems. Specifically around usage, billing and performance monitoring, things I wouldn't use Ceilometer for. But, if someone asked me what they should use for metering today, I'd point them towards Monasca in a heartbeat. Another non-blessed project. It is nice to see that Ceilometer is working to solve their problems, but there are other solutions operators should consider until that time comes. It would be nice to see the TC endorse those too. Solve the users need first. We did work with a few of the Stacktach developers on bringing event collection into ceilometer, and that work is allowing us to modify the way we store the meter data that causes a lot of the performance issues we’ve seen. That work is going on now and will be continued into Kilo, when we expect to be adding drivers for time-series databases more appropriate for that type of data. StackTach isn't actively contributing to Ceilometer any more. Square peg/round hole. We needed some room to experiment with alternative solutions and the rigidity of the process was a hindrance. Not a problem with the core team, just a problem with the dev process overall. I recently suggested that the Ceilometer API (and integration tests) be separated from the implementation (two repos) so others might plug in a different implementation while maintaining compatibility, but that wasn't well received. Personally, I'd like to see that model extended for all OpenStack projects. Keep compatible at the API level and welcome competing implementations. We'll be moving StackTach.v3 [1] to StackForge soon and following that model. The API and integration tests are one repo (with a bare-bones implementation to make the
Re: [openstack-dev] [all] The future of the integrated release
On 8/16/2014 10:09 AM, Chris Dent wrote: On Fri, 15 Aug 2014, Sandy Walsh wrote: I recently suggested that the Ceilometer API (and integration tests) be separated from the implementation (two repos) so others might plug in a different implementation while maintaining compatibility, but that wasn't well received. Personally, I'd like to see that model extended for all OpenStack projects. Keep compatible at the API level and welcome competing implementations. I think this is a _very_ interesting idea, especially the way it fits in with multiple themes that have bounced around the list lately, not just this thread: * Improving project-side testing; that is, pre-gate integration testing. * Providing a framework (at least conceptual) on which to inform the tempest-libification. * Solidifying both intra- and inter-project API contracts (both HTTP and notifications). * Providing a solid basis on which to enable healthy competition between implementations. * Helping to ensure that the various projects work to the goals of their public facing name rather than their internal name (e.g. Telemetry vs ceilometer). +1 ... love that take on it. Given the usual trouble with resource availability it seems best to find tactics that can be applied to multiple strategic goals. Exactly! You get it. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On 8/18/2014 9:27 AM, Thierry Carrez wrote: Clint Byrum wrote: Here's why folk are questioning Ceilometer: Nova is a set of tools to abstract virtualization implementations. Neutron is a set of tools to abstract SDN/NFV implementations. Cinder is a set of tools to abstract block-device implementations. Trove is a set of tools to simplify consumption of existing databases. Sahara is a set of tools to simplify Hadoop consumption. Swift is a feature-complete implementation of object storage, none of which existed when it was started. Keystone supports all of the above, unifying their auth. Horizon supports all of the above, unifying their GUI. Ceilometer is a complete implementation of data collection and alerting. There is no shortage of implementations that exist already. I'm also core on two projects that are getting some push back these days: Heat is a complete implementation of orchestration. There are at least a few of these already in existence, though not as many as their are data collection and alerting systems. TripleO is an attempt to deploy OpenStack using tools that OpenStack provides. There are already quite a few other tools that _can_ deploy OpenStack, so it stands to reason that people will question why we don't just use those. It is my hope we'll push more into the unifying the implementations space and withdraw a bit from the implementing stuff space. So, you see, people are happy to unify around a single abstraction, but not so much around a brand new implementation of things that already exist. Right, most projects focus on providing abstraction above implementations, and that abstraction is where the real domain expertise of OpenStack should be (because no one else is going to do it for us). Every time we reinvent something, we are at larger risk because we are out of our common specialty, and we just may not be as good as the domain specialists. That doesn't mean we should never reinvent something, but we need to be damn sure it's a good idea before we do. It's sometimes less fun to piggyback on existing implementations, but if they exist that's probably what we should do. While Ceilometer is far from alone in that space, what sets it apart is that even after it was blessed by the TC as the one we should all converge on, we keep on seeing competing implementations for some (if not all) of its scope. Convergence did not happen, and without convergence we struggle in adoption. We need to understand why, and if this is fixable. So, here's what happened with StackTach ... We had two teams working on StackTach, one group working on the original program (v2) and another working on Ceilometer integration of our new design. The problem was, there was no way we could compete with the speed of the v2 team. Every little thing we needed to do in OpenStack was a herculean effort. Submit a branch in one place, it needs to go somewhere else. Spend weeks trying to land a branch. Endlessly debate about minutia. It goes on. I know that's the nature of running a large project. And I know everyone is feeling it. We quickly came to realize that, if the stars aligned and we did what we needed to do, we'd only be playing catch-up to the other StackTach team. And StackTach had growing pains. We needed this new architecture to solve real business problems *today*. This isn't built it and they will come, this is we know it's valuable ... when can I have the new one? Like everyone, we have incredible pressure to deliver and we can't accurately forecast with so many uncontrollable factors. Much of what is now StackTach.v3 is (R)esearch not (D)evelopment. With R, we need to be able to run a little fast-and-loose. Not every pull request is a masterpiece. Our plans are going to change. We need to have room to experiment. If it was all just D, yes, we could be more formal. But we frequently go down a road to find a dead end and need to adjust. We started on StackTach.v3 outside of formal OpenStack. It's still open source. We still talk with interested parties (including ceilo) about the design and how we're going to fulfill their needs, but we're mostly head-down trying to get a production ready release in place. In the process, we're making all of StackTach.v3 as tiny repos that other groups (like Ceilo and Monasca) can adopt if they find them useful. Even our impending move to StackForge is going to be a big productivity hit, but it's necessary for some of our potential contributors. Will we later revisit integration with Ceilometer? Possibly, but it's not a priority. We have to serve the customers that are screaming for v3. Arguably this is more of a BDFL model, but in order to innovate quickly, get to large-scale production and remain competitive it may be necessary. This is why I'm pushing for an API-first model in OpenStack. Alternative implementations shouldn't have to live outside the tribe. (as always, my view only)
[openstack-dev] StackTach.v3 - Screencasts ...
Hey y'all, We've started a screencast series on the StackTach.v3 dev efforts [1]. It's still early-days, so subscribe to the playlist for updates. The videos start with the StackTach/Ceilometer integration presentation at the Hong Kong summit, which is useful for background and motivation but then gets into our current dev strategy and state-of-the-union. If you're interested, we will be at the Ops Meetup in San Antonio next week and would love to chat about your monitoring, usage and billing requirements. All the best! -S [1] https://www.youtube.com/playlist?list=PLmyM48VxCGaW5pPdyFNWCuwVT1bCBV5p3 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract
Is there anything slated for the Paris summit around this? I just spent nearly a week parsing Nova notifications and the pain of no schema has overtaken me. We're chatting with IBM about CADF and getting down to specifics on their applicability to notifications. Once I get StackTach.v3 into production I'm keen to get started on revisiting the notification format and olso.messaging support for notifications. Perhaps a hangout for those keenly interested in doing something about this? Thoughts? -S From: Eoghan Glynn [egl...@redhat.com] Sent: Monday, July 14, 2014 8:53 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [all] Treating notifications as a contract So what we need to figure out is how exactly this common structure can be accommodated without reverting back to what Sandy called the wild west in another post. I got the impression that wild west is what we've already got (within the payload)? Yeah, exactly, that was my interpretation too. So basically just to ensure that the lightweight-schema/common-structure notion doesn't land us back not too far beyond square one (if there are too many degrees-of-freedom in that declaration of a list of dicts with certain required fields that you had envisaged in an earlier post). For example you could write up a brief wiki walking through how an existing widely-consumed notification might look under your vision, say compute.instance.start.end. Then post a link back here as an RFC. Or, possibly better, maybe submit up a strawman spec proposal to one of the relevant *-specs repos and invite folks to review in the usual way? Would oslo-specs (as in messaging) be the right place for that? That's a good question. Another approach would be to hone in on the producer-side that's currently the heaviest user of notifications, i.e. nova, and propose the strawman to nova-specs given that (a) that's where much of the change will be needed, and (b) many of the notification patterns originated in nova and then were subsequently aped by other projects as they were spun up. My thinking is the right thing to do is bounce around some questions here (or perhaps in a new thread if this one has gone far enough off track to have dropped people) and catch up on some loose ends. Absolutely! For example: It appears that CADF was designed for this sort of thing and was considered at some point in the past. It would be useful to know more of that story if there are any pointers. My initial reaction is that CADF has the stank of enterprisey all over it rather than less is more and worse is better but that's a completely uninformed and thus unfair opinion. TBH I don't know enough about CADF, but I know a man who does ;) (gordc, I'm looking at you!) Another question (from elsewhere in the thread) is if it is worth, in the Ironic notifications, to try and cook up something generic or to just carry on with what's being used. Well, my gut instinct is that the content of the Ironic notifications is perhaps on the outlier end of the spectrum compared to the more traditional notifications we see emitted by nova, cinder etc. So it may make better sense to concentrate initially on how contractizing these more established notifications might play out. This feels like something that we should be thinking about with an eye to the K* cycle - would you agree? Yup. Thanks for helping to tease this all out and provide some direction on where to go next. Well thank *you* for picking up the baton on this and running with it :) Cheers, Eoghan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract
On 9/3/2014 11:32 AM, Chris Dent wrote: On Wed, 3 Sep 2014, Sandy Walsh wrote: We're chatting with IBM about CADF and getting down to specifics on their applicability to notifications. Once I get StackTach.v3 into production I'm keen to get started on revisiting the notification format and olso.messaging support for notifications. Perhaps a hangout for those keenly interested in doing something about this? That seems like a good idea. I'd like to be a part of that. Unfortunately I won't be at summit but would like to contribute what I can before and after. I took some notes on this a few weeks ago and extracted what seemed to be the two main threads or ideas the were revealed by the conversation that happened in this thread: * At the micro level have versioned schema for notifications such that one end can declare I am sending version X of notification foo.bar.Y and the other end can effectively deal. Yes, that's table-stakes I think. Putting structure around the payload section. Beyond type and version we should be able to attach meta information like public/private visibility and perhaps hints for external mapping (this trait - that trait in CADF, for example). * At the macro level standardize a packaging or envelope of all notifications so that they can be consumed by very similar code. That is: constrain the notifications in some way so we can also constrain the consumer code. That's the intention of what we have now. The top level traits are standard, the payload is open. We really only require: message_id, timestamp and event_type. For auditing we need to cover Who, What, When, Where, Why, OnWhat, OnWhere, FromWhere. These ideas serve two different purposes: One is to ensure that existing notification use cases are satisfied with robustness and provide a contract between two endpoints. The other is to allow a fecund notification environment that allows and enables many participants. Good goals. When Producer and Consumer know what to expect, things are good ... I know to find the Instance ID here. When the consumer wants to deal with a notification as a generic object, things get tricky (find the instance ID in the payload, What is the image type?, Is this an error notification?) Basically, how do we define the principle artifacts for each service and grant the consumer easy/consistent access to them? (like the 7-W's above) I'd really like to find a way to solve that problem. Is that a good summary? What did I leave out or get wrong? Great start! Let's keep it simple and do-able. We should also review the oslo.messaging notification api ... I've got some concerns we've lost our way there. -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract (CADF)
Yesterday, we had a great conversation with Matt Rutkowski from IBM, one of the authors of the CADF spec. I was having a disconnect on what CADF offers and got it clarified. My assumption was CADF was a set of transformation/extraction rules for taking data from existing data structures and defining them as well-known things. For example, CADF needs to know who sent this notification. I thought CADF would give us a means to point at an existing data structure and say that's where you find it. But I was wrong. CADF is a full-on schema/data structure of its own. It would be a fork-lift replacement for our existing notifications. However, if your service hasn't really adopted notifications yet (green field) or you can handle a fork-lift replacement, CADF is a good option. There are a few gotcha's though. If you have required data that is outside of the CADF spec, it would need to go in the attachment section of the notification and that still needs a separate schema to define it. Matt's team is very receptive to extending the spec to include these special cases though. Anyway, I've written up all the options (as I see them) [1] with the advantages/disadvantages of each approach. It's just a strawman, so bend/spindle/mutilate. Look forward to feedback! -S [1] https://wiki.openstack.org/wiki/NotificationsAndCADF On 9/3/2014 12:30 PM, Sandy Walsh wrote: On 9/3/2014 11:32 AM, Chris Dent wrote: On Wed, 3 Sep 2014, Sandy Walsh wrote: We're chatting with IBM about CADF and getting down to specifics on their applicability to notifications. Once I get StackTach.v3 into production I'm keen to get started on revisiting the notification format and olso.messaging support for notifications. Perhaps a hangout for those keenly interested in doing something about this? That seems like a good idea. I'd like to be a part of that. Unfortunately I won't be at summit but would like to contribute what I can before and after. I took some notes on this a few weeks ago and extracted what seemed to be the two main threads or ideas the were revealed by the conversation that happened in this thread: * At the micro level have versioned schema for notifications such that one end can declare I am sending version X of notification foo.bar.Y and the other end can effectively deal. Yes, that's table-stakes I think. Putting structure around the payload section. Beyond type and version we should be able to attach meta information like public/private visibility and perhaps hints for external mapping (this trait - that trait in CADF, for example). * At the macro level standardize a packaging or envelope of all notifications so that they can be consumed by very similar code. That is: constrain the notifications in some way so we can also constrain the consumer code. That's the intention of what we have now. The top level traits are standard, the payload is open. We really only require: message_id, timestamp and event_type. For auditing we need to cover Who, What, When, Where, Why, OnWhat, OnWhere, FromWhere. These ideas serve two different purposes: One is to ensure that existing notification use cases are satisfied with robustness and provide a contract between two endpoints. The other is to allow a fecund notification environment that allows and enables many participants. Good goals. When Producer and Consumer know what to expect, things are good ... I know to find the Instance ID here. When the consumer wants to deal with a notification as a generic object, things get tricky (find the instance ID in the payload, What is the image type?, Is this an error notification?) Basically, how do we define the principle artifacts for each service and grant the consumer easy/consistent access to them? (like the 7-W's above) I'd really like to find a way to solve that problem. Is that a good summary? What did I leave out or get wrong? Great start! Let's keep it simple and do-able. We should also review the oslo.messaging notification api ... I've got some concerns we've lost our way there. -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] StackTach.v3 - Screencasts ...
For those of you playing the home game ... just added four new screencasts to the StackTach.v3 playlist. These are technical deep dives into the code added over the last week or so, with demos. For the more complex topics I spend a little time on the background and rationale. StackTach.v3: Stream debugging (24:22) StackTach.v3: Idempotent pipeline processing and debugging (12:16) StackTach.v3: Quincy Quince - the REST API (22:56) StackTach.v3: Klugman the versioned cmdline tool for Quincy (8:46) https://www.youtube.com/playlist?list=PLmyM48VxCGaW5pPdyFNWCuwVT1bCBV5p3 Please add any comments to the video and I'll try to address them there. Next ... the move to StackForge! Have a great weekend! -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract
Jay Pipes - Wednesday, September 10, 2014 3:56 PM On 09/03/2014 11:21 AM, Sandy Walsh wrote: On 9/3/2014 11:32 AM, Chris Dent wrote: I took some notes on this a few weeks ago and extracted what seemed to be the two main threads or ideas the were revealed by the conversation that happened in this thread: * At the micro level have versioned schema for notifications such that one end can declare I am sending version X of notification foo.bar.Y and the other end can effectively deal. Yes, that's table-stakes I think. Putting structure around the payload section. Beyond type and version we should be able to attach meta information like public/private visibility and perhaps hints for external mapping (this trait - that trait in CADF, for example). CADF doesn't address the underlying problem that Chris mentions above: that our notification events themselves needs to have a version associated with them. Instead of versioning the message payloads themselves, instead CADF focuses versioning on the CADF spec itself, which is less than useful, IMO, and a sympton of what I like to call XML-itis. Well, the spec is the payload, so you can't change the payload without changing the spec. Could be semantics, but I see your point. Where I *do* see some value in CADF is the primitive string codes it defines for resource classifications, actions, and outcomes (Sections A.2.5, A.3.5., and A.4.5 respectively in the CADF spec). I see no value in the long-form XML-itis fully-qualified URI long-forms of the primitive string codes. +1 to the xml-itis, but do we really get any value from the resource classifications without them? Other than, yes, that's a good list to work from.? For resource classifications, it defines things like compute, storage, service, etc, as well as a structured hierarchy for sub-classifications, like storage/volume or service/block. Actions are string codes for verbs like create, configure or authenticate. Outcomes are string codes for success, failure, etc. What I feel we need is a library that matches a (resource_type, action, version) tuple to a JSONSchema document that describes the payload for that combination of resource_type, action, and version. The 7-W's that CADF define are quite useful and we should try to ensure our notification payloads address as many of them as possible. Who, What, When, Where, Why, On-What, To-Whom, To-Where ... not all are applicable for every notification type. Also, we need to define standard units-of-measure for numeric fields: mb vs. gb, bps vs. kbps, image type definitions ... ideally all this should be part of the standard openstack nomenclature. These are the things that really belong in oslo and used by everything from notifications to the scheduler to flavor definitions, etc. If I were king for a day, I'd have a standardized notification message format that simply consisted of: resource_class (string) -- From CADF, e.g. service/block occurred_on (timestamp) -- when the event was published action (string) -- From CADF, e.g. create version (int or tuple) -- version of the (resource_class, action) payload (json-encoded string) -- the message itself outcome (string) -- Still on fence for this, versus just using payload Yep, not a problem with that, so long as the payload has all the other things we need (versioning, data types, visibility, etc) There would be an Oslo library that would store the codification of the resource classes and actions, along with the mapping of (resource_class, action, version) to the JSONSchema document describing the payload field. Producers of messages would consume the oslo lib like so: ```python from oslo.notifications import resource_classes from oslo.notifications import actions from oslo.notifications import message Not sure how this would look from a packaging perspective, but sure. I'm not sure if I like having to define every resource/action type in code and then having an explosion of types in notification.actions ... perhaps that should just be part of the schema definition 'action_type': string [acceptable values: create, delete, update] I'd rather see these schemas defined in some machine readable format (yaml or something) vs. code. Other languages are going to want to consume these notifications and should be able to reuse the definitions. from nova.compute import power_states from nova.compute import task_states ... msg = message.Message(resource_classes.compute.machine, actions.update, version=1) # msg is now an object that is guarded by the JSONSchema document # that describes the version 1.0 schema of the UPDATE action # for the resource class representing a VM (compute.machine) # This means that if the producer attempts to set an # attribute of the msg object that is *not* in that JSONSchema # document, then an AttributeError would be raised. This essentially
Re: [openstack-dev] [Nova] - do we need .start and .end notifications in all cases ?
Hey Phil, (sorry for top-post, web client) There's no firm rule for requiring .start/.end and I think your criteria defines it well. Long running transactions (or multi complex-step transactions). The main motivator behind .start/.end code was .error notifications not getting generated in many cases. We had no idea where something was failing. Putting a .start before the db operation let us know well, at least the service got the call For some operations like resize, migrate, etc., the .start/.end is good for auditing and billing. Although, we could do a better job by simply managing the launched_at, deleted_at times better. Later, we found that by reviewing .start/.end deltas we were able to predict pending failures before timeouts actually occurred. But no, they're not mandatory and a single notification should certainly be used for simple operations. Cheers! -S From: Day, Phil [philip@hp.com] Sent: Monday, September 22, 2014 8:03 AM To: OpenStack Development Mailing List (openstack-dev@lists.openstack.org) Subject: [openstack-dev] [Nova] - do we need .start and .end notifications in all cases ? Hi Folks, I’d like to get some opinions on the use of pairs of notification messages for simple events. I get that for complex operations on an instance (create, rebuild, etc) a start and end message are useful to help instrument progress and how long the operations took.However we also use this pattern for things like aggregate creation, which is just a single DB operation – and it strikes me as kind of overkill and probably not all that useful to any external system compared to a a single event “.create” event after the DB operation. There is a change up for review to add notifications for service groups which is following this pattern (https://review.openstack.org/#/c/107954/) – the author isn’t doing anything wrong in that there just following that pattern, but it made me wonder if we shouldn’t have some better guidance on when to use a single notification rather that a .start/.end pair. Does anyone else have thoughts on this , or know of external systems that would break if we restricted .start and .end usage to long-lived instance operations ? Phil ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] - do we need .start and .end notifications in all cases ?
+1, the high-level code should deal with top-level exceptions and generate .error notifications (though it's a little spotty). Ideally we shouldn't need three events for simple operations. The use of .start/.end vs. logging is a bit of a blurry line. At its heart a notification should provide context around an operation: What happened? Who did it? Who did they do it to? Where did it happen? Where is it going to? etc. Stuff that could be used for auditing/billing. That's their main purpose. But for mission critical operations (create instance, etc) notifications give us a hot-line to god. Something is wrong! vs. having to pour through log files looking for problems. Real-time. Low latency. I think it's a case-by-case judgement call which should be used. From: Day, Phil [philip@hp.com] I'm just a tad worried that this sounds like its starting to use notification as a replacement for logging.If we did this for every CRUD operation on an object don't we risk flooding the notification system. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] - do we need .start and .end notifications in all cases ?
From: Jay Pipes [jaypi...@gmail.com] Sent: Monday, September 22, 2014 11:51 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Nova] - do we need .start and .end notifications in all cases ? On 09/22/2014 07:37 AM, Sandy Walsh wrote: For some operations like resize, migrate, etc., the .start/.end is good for auditing and billing. Although, we could do a better job by simply managing the launched_at, deleted_at times better. I'm sure I'll get no real disagreement from you or Andrew Laski on this... but the above is one of the reasons we really should be moving with pace towards a fully task-driven system, both internally in Nova and externally via the Compute REST API. This would allow us to get rid of the launched_at, deleted_at, created_at, updated_at, etc fields in many of the database tables and instead have a data store for tasks RDBMS or otherwise) that had start and end times in the task record, along with codified task types. You can see what I had in mind for the public-facing side of this here: http://docs.oscomputevnext.apiary.io/#schema See the schema for server task and server task item. Totally agree. Though I would go one step further and say the Task state transitions should be managed by notifications. Then oslo.messaging is reduced to the simple notifications interface (no RPC). Notification follow proper retry semantics and control Tasks. Tasks themselves can restart/retry/etc. (I'm sure I'm singing to the choir) -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] Nomination of Sandy Walsh to core team
Thanks John, happy to help out if possible and I agree that events could use some extra attention. -S From: Herndon, John Luke [john.hern...@hp.com] Sent: Monday, December 09, 2013 5:30 PM To: OpenStack Development Mailing List (not for usage questions) Subject: [openstack-dev] [Ceilometer] Nomination of Sandy Walsh to core team Hi There! I¹m not 100% sure what the process is around electing an individual to the core team (i.e., can a non-core person nominate someone?). However, I believe the ceilometer core team could use a member who is more active in the development of the event pipeline. A core developer in this area will not only speed up review times for event patches, but will also help keep new contributions focused on the overall eventing vision. To that end, I would like to nominate Sandy Walsh from Rackspace to ceilometer-core. Sandy is one of the original authors of StackTach, and spearheaded the original stacktach-ceilometer integration. He has been instrumental in many of my codes reviews, and has contributed much of the existing event storage and querying code. Thanks, John Herndon Software Engineer HP Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Healthnmon
On 12/18/2013 06:28 AM, Oleg Gelbukh wrote: I would copy that question. Looks like integration plan didn't work out, and healthnmon development either stalled or gone shadow.. Anyone have information on that? I think that's the case. There was no mention of Healthnmon at the last summit. -- Best regards, Oleg Gelbukh Mirnatis Inc. On Tue, Dec 17, 2013 at 11:29 PM, David S Taylor da...@bluesunrise.com mailto:da...@bluesunrise.com wrote: Could anyone tell me about the status of the Healthnmon project [1]? There is a proposal [2] to integrate Ceilometer and Healthnmon, which is about 1 year old. I am interested in developing a monitoring solution, and discovered that there may already be a project and community in place around OpenStack monitoring, or not [1] https://github.com/stackforge/healthnmon/tree/master/healthnmon [2] https://wiki.openstack.org/wiki/Ceilometer/CeilometerAndHealthnmon Thanks, -- David S Taylor CTO, Bluesunrise 707 529-9194 da...@bluesunrise.com mailto:da...@bluesunrise.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] How do we format/version/deprecate things from notifications?
On 12/18/2013 01:44 PM, Nikola Đipanov wrote: On 12/18/2013 06:17 PM, Matt Riedemann wrote: On 12/18/2013 9:42 AM, Matt Riedemann wrote: The question came up in this patch [1], how do we deprecate and remove keys in the notification payload? In this case I need to deprecate and replace the 'instance_type' key with 'flavor' per the associated blueprint. [1] https://review.openstack.org/#/c/62430/ By the way, my thinking is it's handled like a deprecated config option, you deprecate it for a release, make sure it's documented in the release notes and then drop it in the next release. For anyone that hasn't switched over they are broken until they start consuming the new key. FWIW - I am OK with this approach - but we should at least document it. I am also thinking that we may want to make it explicit like oslo.config does it. Likewise ... until we get defined schemas and versioning on notifications, it seems reasonable. A post to the ML is nice too :) Thanks, N. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] How do we format/version/deprecate things from notifications?
On 12/18/2013 03:00 PM, Russell Bryant wrote: We really need proper versioning for notifications. We've had a blueprint open for about a year, but AFAICT, nobody is actively working on it. https://blueprints.launchpad.net/nova/+spec/versioned-notifications IBM is behind this effort now and are keen to get CADF support around notifications. Seems to handle all of our use cases. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] [Nova] [oslo] [Ceilometer] about notifications : huge and may be non secure
On 01/29/2014 11:50 AM, Swann Croiset wrote: Hi stackers, I would like to sharemy wonder here about Notifications. I'm working [1] on Heat notifications and I noticed that : 1/ Heat uses his context to store 'password' 2/ Heat and Nova store 'auth_token' in context too. Didn't check for other projects except for neutron which doesn't store auth_token These infos are consequently sent thru their notifications. I guess we consider the broker as securised and network communications with services too BUT should not we delete these data anyway since IIRC they are never in use(at least by ceilometer)and by the way throwing it away the security question ? My other concern is the size (Kb) of notifications : 70% for auth_token (with pki) ! We can reduce the volume drastically and easily by deleting these data from notifications. I know that RabbitMQ (or others) is very robust and can handle this volume but when I see this kind of improvements, I'am tempted to do it. I see an easy way to fix that in oslo-incubator [2] : delete keys of context if existing, config driven with password and auth_token by default thoughts? Yeah, there was a bunch of work in nova to eliminate these sorts of fields from the notification payload. They should certainly be eliminated from other services as well. Ideally, as you mention, at the olso layer. We assume the notifications can be large, but they shouldn't be that large. The CADF work that IBM is doing to provide versioning and schemas to notifications will go a long way here. They have provisions for marking fields as private. I think this is the right way to go, but we may have to do some hack fixes in the short term. -S [1] https://blueprints.launchpad.net/ceilometer/+spec/handle-heat-notifications [2] https://github.com/openstack/oslo-incubator/blob/master/openstack/common/notifier/rpc_notifier.py and others ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo-notify] notifications consumed by multiple subscribers
The notification system can specify multiple queues to publish to, so each of your dependent services can feed from a separate queue. However, this is a critical bug in oslo.messaging that has broken this feature. https://bugs.launchpad.net/nova/+bug/1277204 Hopefully it'll get fixed quickly and you'll be able to do what you need. There were plans for a notification consumer in oslo.messaging, but I don't know where it stands. I'm working on a standalone notification consumer library for rabbit. -S From: Sanchez, Cristian A [cristian.a.sanc...@intel.com] Sent: Tuesday, February 11, 2014 2:28 PM To: OpenStack Development Mailing List (not for usage questions) Subject: [openstack-dev] [oslo-notify] notifications consumed by multiple subscribers Hi, I’m planning to use oslo.notify mechanisms to implement a climate blueprint: https://blueprints.launchpad.net/climate/+spec/notifications. Ideally, the notifications sent by climate should be received by multiple services subscribed to the same topic. Is that possible with oslo.notify? And moreover, is there any mechanism for removing items from the queue? Or should one subscriber be responsible for removing items from it? Thanks Cristian ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Gamification and on-boarding ...
At the Nova mid-cycle meetup we've been talking about the problem of helping new contributors. It got into a discussion of karma, code reviews, bug fixes and establishing a name for yourself before screaming in a chat room can someone look at my branch. We want this experience to be positive, but not everyone has time to hand-hold new people in the dance. The informal OpenStack motto is automate everything, so perhaps we should consider some form of gamification [1] to help us? Can we offer badges, quests and challenges to new users to lead them on the way to being strong contributors? Fixed your first bug badge Updated the docs badge Got your blueprint approved badge Triaged a bug badge Reviewed a branch badge Contributed to 3 OpenStack projects badge Fixed a Cells bug badge Constructive in IRC badge Freed the gate badge Reverted branch from a core badge etc. These can be strung together as Quests to lead people along the path. It's more than karma and less sterile than stackalytics. The Foundation could even promote the rising stars and highlight the leader board. There are gamification-as-a-service offerings out there [2] as well as Fedora Badges [3] (python and open source) that we may want to consider. Thoughts? -Sandy [1] http://en.wikipedia.org/wiki/Gamification [2] http://gamify.com/ (and many others) [3] https://badges.fedoraproject.org/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What is the currently accepted way to do plugins
This brings up something that's been gnawing at me for a while now ... why use entry-point based loaders at all? I don't see the problem they're trying to solve. (I thought I got it for a while, but I was clearly fooling myself) 1. If you use the load all drivers in this category feature, that's a security risk since any compromised python library could hold a trojan. 2. otherwise you have to explicitly name the plugins you want (or don't want) anyway, so why have the extra indirection of the entry-point? Why not just name the desired modules directly? 3. the real value of a loader would be to also extend/manage the python path ... that's where the deployment pain is. Use fully qualified filename driver and take care of the pathing for me. Abstracting the module/class/function name isn't a great win. I don't see where the value is for the added pain (entry-point management/package metadata) it brings. CMV, -S From: Russell Bryant [rbry...@redhat.com] Sent: Tuesday, March 04, 2014 1:29 PM To: Murray, Paul (HP Cloud Services); OpenStack Development Mailing List Subject: Re: [openstack-dev] [Nova] What is the currently accepted way to do plugins On 03/04/2014 06:27 AM, Murray, Paul (HP Cloud Services) wrote: One of my patches has a query asking if I am using the agreed way to load plugins: https://review.openstack.org/#/c/71557/ I followed the same approach as filters/weights/metrics using nova.loadables. Was there an agreement to do it a different way? And if so, what is the agreed way of doing it? A pointer to an example or even documentation/wiki page would be appreciated. The short version is entry-point based plugins using stevedore. We should be careful though. We need to limit what we expose as external plug points, even if we consider them unstable. If we don't want it to be public, it may not make sense for it to be a plugin interface at all. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What is the currently accepted way to do plugins
And sorry, as to your original problem, the loadables approach is kinda messy since only the classes that are loaded when *that* module are loaded are used (vs. explicitly specifying them in a config). You may get different results when the flow changes. Either entry-points or config would give reliable results. On 03/04/2014 03:21 PM, Murray, Paul (HP Cloud Services) wrote: In a chat with Dan Smith on IRC, he was suggesting that the important thing was not to use class paths in the config file. I can see that internal implementation should not be exposed in the config files - that way the implementation can change without impacting the nova users/operators. There's plenty of easy ways to deal with that problem vs. entry points. MyModule.get_my_plugin() ... which can point to anywhere in the module permanently. Also, we don't have any of the headaches of merging setup.cfg sections (as we see with oslo.* integration). Sandy, I'm not sure I really get the security argument. Python provides every means possible to inject code, not sure plugins are so different. Certainly agree on choosing which plugins you want to use though. The concern is that any compromised part of the python eco-system can get auto-loaded with the entry-point mechanism. Let's say Nova auto-loads all modules with entry-points the [foo] section. All I have to do is create a setup that has a [foo] section and my code is loaded. Explicit is better than implicit. So, assuming we don't auto-load modules ... what does the entry-point approach buy us? From: Russell Bryant [rbry...@redhat.com] We should be careful though. We need to limit what we expose as external plug points, even if we consider them unstable. If we don't want it to be public, it may not make sense for it to be a plugin interface at all. I'm not sure what the concern with introducing new extension points is? OpenStack is basically just a big bag of plugins. If it's optional, it's supposed to be a plugin (according to the design tenets). -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What is the currently accepted way to do plugins
On 03/04/2014 05:00 PM, Kevin L. Mitchell wrote: On Tue, 2014-03-04 at 12:11 -0800, Dan Smith wrote: Now, the actual concern is not related to any of that, but about whether we're going to open this up as a new thing we support. In general, my reaction to adding new APIs people expect to be stable is no. However, I understand why things like the resource reporting and even my events mechanism are very useful for deployers to do some plumbing and monitoring of their environment -- things that don't belong upstream anyway. So I'm conflicted. I think that for these two cases, as long as we can say that it's not a stable interface, I think it's probably okay. However, things like we've had in the past, where we provide a clear plug point for something like Compute manager API class are clearly off the table, IMHO. How about using 'unstable' as a component of the entrypoint group? E.g., nova.unstable.events… Wouldn't that defeat the point of entry points ... immutable endpoints? What happens when an unstable event is deemed stable? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Mistral] Crack at a Real life workflow
DSL's are tricky beasts. On one hand I like giving a tool to non-developers so they can do their jobs, but I always cringe when the DSL reinvents the wheel for basic stuff (compound assignment expressions, conditionals, etc). YAML isn't really a DSL per se, in the sense that it has no language constructs. As compared to a Ruby-based DSL (for example) where you still have Ruby under the hood for the basic stuff and extensions to the language for the domain-specific stuff. Honestly, I'd like to see a killer object model for defining these workflows as a first step. What would a python-based equivalent of that real-world workflow look like? Then we can ask ourselves, does the DSL make this better or worse? Would we need to expose things like email handlers, or leave that to the general python libraries? $0.02 -S On 03/05/2014 10:50 PM, Dmitri Zimine wrote: Folks, I took a crack at using our DSL to build a real-world workflow. Just to see how it feels to write it. And how it compares with alternative tools. This one automates a page from OpenStack operation guide: http://docs.openstack.org/trunk/openstack-ops/content/maintenance.html#planned_maintenance_compute_node Here it is https://gist.github.com/dzimine/9380941 or here http://paste.openstack.org/show/72741/ I have a bunch of comments, implicit assumptions, and questions which came to mind while writing it. Want your and other people's opinions on it. But gist and paste don't let annotate lines!!! :( May be we can put it on the review board, even with no intention to check in, to use for discussion? Any interest? DZ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Mistral] Crack at a Real life workflow
On 03/06/2014 02:16 PM, Renat Akhmerov wrote: IMO, it looks not bad (sorry, I’m biased too) even now. Keep in mind this is not the final version, we keep making it more expressive and concise. As for killer object model it’s not 100% clear what you mean. As always, devil in the details. This is a web service with all the consequences. I assume what you call “object model” here is nothing else but a python binding for the web service which we’re also working on. Custom python logic you mentioned will also be possible to easily integrate. Like I said, it’s still a pilot stage of the project. Yeah, the REST aspect is where the tricky part comes in :) Basically, in order to make a grammar expressive enough to work across a web interface, we essentially end up writing a crappy language. Instead, we should focus on the callback hooks to something higher level to deal with these issues. Minstral should just say I'm done this task, what should I do next? and the callback service can make decisions on where in the graph to go next. Likewise with things like sending emails from the backend. Minstral should just call a webhook and let the receiver deal with active states as they choose. Which is why modelling this stuff in code is usually always better and why I'd lean towards the TaskFlow approach to the problem. They're tackling this from a library perspective first and then (possibly) turning it into a service. Just seems like a better fit. It's also the approach taken by Amazon Simple Workflow and many BPEL engines. -S Renat Akhmerov @ Mirantis Inc. On 06 Mar 2014, at 22:26, Joshua Harlow harlo...@yahoo-inc.com wrote: That sounds a little similar to what taskflow is trying to do (I am of course biased). I agree with letting the native language implement the basics (expressions, assignment...) and then building the domain ontop of that. Just seems more natural IMHO, and is similar to what linq (in c#) has done. My 3 cents. Sent from my really tiny device... On Mar 6, 2014, at 5:33 AM, Sandy Walsh sandy.wa...@rackspace.com wrote: DSL's are tricky beasts. On one hand I like giving a tool to non-developers so they can do their jobs, but I always cringe when the DSL reinvents the wheel for basic stuff (compound assignment expressions, conditionals, etc). YAML isn't really a DSL per se, in the sense that it has no language constructs. As compared to a Ruby-based DSL (for example) where you still have Ruby under the hood for the basic stuff and extensions to the language for the domain-specific stuff. Honestly, I'd like to see a killer object model for defining these workflows as a first step. What would a python-based equivalent of that real-world workflow look like? Then we can ask ourselves, does the DSL make this better or worse? Would we need to expose things like email handlers, or leave that to the general python libraries? $0.02 -S On 03/05/2014 10:50 PM, Dmitri Zimine wrote: Folks, I took a crack at using our DSL to build a real-world workflow. Just to see how it feels to write it. And how it compares with alternative tools. This one automates a page from OpenStack operation guide: http://docs.openstack.org/trunk/openstack-ops/content/maintenance.html#planned_maintenance_compute_node Here it is https://gist.github.com/dzimine/9380941 or here http://paste.openstack.org/show/72741/ I have a bunch of comments, implicit assumptions, and questions which came to mind while writing it. Want your and other people's opinions on it. But gist and paste don't let annotate lines!!! :( May be we can put it on the review board, even with no intention to check in, to use for discussion? Any interest? DZ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] RFC - using Gerrit for Nova Blueprint review approval
Yep, great idea. Do it. On 03/07/2014 02:53 AM, Chris Behrens wrote: On Mar 6, 2014, at 11:09 AM, Russell Bryant rbry...@redhat.com wrote: […] I think a dedicated git repo for this makes sense. openstack/nova-blueprints or something, or openstack/nova-proposals if we want to be a bit less tied to launchpad terminology. +1 to this whole idea.. and we definitely should have a dedicated repo for this. I’m indifferent to its name. :) Either one of those works for me. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo.messaging] mongodb notification driver
You may want to consider StackTach for troubleshooting (that's what it was initially created for) https://github.com/rackerlabs/stacktach It will consume and record the events as well as give you a gui and cmdline tools for tracing calls by server, request_id, event type, etc. Ping me if you have any issues getting it going. Cheers -S From: Hiroyuki Eguchi [h-egu...@az.jp.nec.com] Sent: Tuesday, March 11, 2014 11:09 PM To: openstack-dev@lists.openstack.org Subject: [openstack-dev] [oslo.messaging] mongodb notification driver I'm envisioning a mongodb notification driver. Currently, For troubleshooting, I'm using a log driver of notification, and sent notification log to rsyslog server, and store log in database using rsyslog-mysql package. I would like to make it more simple, So I came up with this feature. Ceilometer can manage notifications using mongodb, but Ceilometer should have the role of Metering, not Troubleshooting. If you have any comments or suggestion, please let me know. And please let me know if there's any discussion about this. Thanks. --hiroyuki ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi][TC] Withdraw graduation request
Big +1 From: Jay Pipes [jaypi...@gmail.com] Sent: Thursday, March 20, 2014 8:18 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Marconi][TC] Withdraw graduation request This is a very mature stance and well-written email. Thanks, Flavio and all of the Marconi team for having thick skin and responding to the various issues professionally. Cheers, -jay On Thu, 2014-03-20 at 23:59 +0100, Flavio Percoco wrote: Greetings, I'm sending this email on behalf of Marconi's team. As you already know, we submitted our graduation request a couple of weeks ago and the meeting was held on Tuesday, March 18th. During the meeting very important questions and issues were raised that made us think and analyse our current situation and re-think about what the best for OpenStack and Marconi would be in this very moment. After some considerations, we've reached the conclusion that this is probably not the right time for this project to graduate and that it'll be fruitful for the project and the OpenStack community if we take another development cycle before coming out from incubation. Here are some things we took under consideration: 1. It's still not clear to the overall community what the goals of the project are. It is not fair for Marconi as a project nor for OpenStack as a community to move forward with this integration when there are still open questions about the project goals. 2. Some critical issues came out of our attempt to have a gate job. For the team, the project and the community this is a very critical point. We've managed to have the gate working but we're still not happy with the results. 3. The drivers currently supported by the project don't cover some important cases related to deploying it. One of them solves a licensing issue but introduces a scale issue whereas the other one solves the scale issue and introduces a licensing issue. Moreover, these drivers have created quite a confusion with regards to what the project goal's are too. 4. We've seen the value - and believe in it - of OpenStack's incubation period. During this period, the project has gained maturity in its API, supported drivers and integration with the overall community. 5. Several important questions were brought up in the recent ML discussions. These questions take time, effort but also represent a key point in the support, development and integration of the project with the rest of OpenStack. We'd like to dedicate to this questions the time they deserve. 6. There are still some open questions in the OpenStack community related to the graduation requirements and the required supported technologies of integrated projects. Based on the aforementioned points, the team would like to withdraw the graduation request and remain an incubated project for one more development cycle. During the upcoming months, the team will focus on solving the issues that arose as part of last Tuesday's meeting. If possible, we would like to request a meeting where we can discuss with the TC - and whoever wants to participate - a set of *most pressing issues* that should be solved before requesting another graduation meeting. The team will be focused on solving those issues and other issues down that road. Although the team believes in the project's technical maturity, we think this is what is best for OpenStack and the project itself community-wise. The open questions are way too important for the team and the community and they shouldn't be ignored nor rushed. I'd also like to thank the team and the overall community. The team for its hard work during the last cycle and the community for being there and providing such important feedback in this process. We look forward to see Marconi graduating from incubation. Bests, Marconi's team. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] ordering of notification 'events' in oslo.messaging
On 03/31/2014 10:55 AM, Gordon Sim wrote: I believe that ordering of notifications at different levels is not guaranteed when receiving those notifications using a notification listener in olso.messaging. I.e. with something like: notifier = notifier.Notifier(get_transport(CONF), 'compute') notifier.info(ctxt, event_type, payload_1) notifier.warn(ctxt, event_type, payload_2) its possible that payload_1 is received after payload_2. The root cause is that a different queue is used for events of each level. In practice this is easier to observe with rabbit than qpid, as the qpid driver send every message synchronously which reduces the likelihood of there being more than one message on the listeners queues from the same notifier. Even for rabbit it takes a couple of thousand events before it usually occurs. Load on either the receiving client or the broker could increase the likelihood of out of order deliveries. Not sure if this is intended, but as it isn't immediately obvious, I thought it would be worth a note to the list. If they're on different queues, the order they appear depends on the consumer(s). It's not really an oslo.messaging issue. You can do reproduce it with just two events: warn.publish(Foo) info.publish(Blah) consume from info consume from warn info is out of order. And, it's going to happen anyway if we get into a timeout and requeue() scenario. I think we have to assume that ordering cannot be guaranteed and it's the consumers responsibility to handle it. -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ceilometer] PTL candidacy
On 04/02/2014 05:47 PM, Gordon Chung wrote: I'd like to announce my candidacy for PTL of Ceilometer. Woot! ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] Notifications from non-local exchanges
On 10/10/2013 06:16 PM, Neal, Phil wrote: Greetings all, I'm looking at how to expand the ability of our CM instance to consume notifications and have a quick question about the configuration and flow... For the notifications central agent , we rely on the services (i.e. glance, cinder) to drop messages on the same messaging host as used by Ceilometer. From there the listener picks it up and cycles through the plugin logic to convert it to a sample. It's apparent that we can't pass an alternate hostname via the control_exchange values, so is there another method for harvesting messages off of other instances (e.g. another compute node)? Hey Phil, You don't really need to specify the exchange name to consume notifications. It will default to the control-exchange if not specified anyway. How it works isn't so obvious. Depending on the priority of then notification the oslo notifier will publish on topic.priority using the service's control-exchange. If that queue doesn't exist it'll create it and bind the control-exchange to it. This is so we can publish even if there are no consumers yet. Oslo.rpc creates a 1:1 mapping of routing_key and queue to topic (no wildcards). So we get exchange:service - binding: routing_key topic.priority - queue topic.priority (essentially, 1 queue per priority) Which is why, if you want to enable services to generate notifications, you just have to set the driver and the topic(s) to publish on. Exchange is implied and routing key/queue are inferred from topic. Likewise we only have to specify the queue name to consume, since we only need an exchange to publish. I have a bare-bones oslo notifier consumer and client here if you want to mess around with it (and a bare-bones kombu version in the parent). https://github.com/SandyWalsh/amqp_sandbox/tree/master/oslo Not sure if that answered your question or made it worse? :) Cheers -S - Phil ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] Notifications from non-local exchanges
On 10/15/2013 12:28 PM, Neal, Phil wrote: -Original Message- From: Sandy Walsh [mailto:sandy.wa...@rackspace.com] Sent: Thursday, October 10, 2013 6:20 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Ceilometer] Notifications from non-local exchanges On 10/10/2013 06:16 PM, Neal, Phil wrote: Greetings all, I'm looking at how to expand the ability of our CM instance to consume notifications and have a quick question about the configuration and flow... For the notifications central agent , we rely on the services (i.e. glance, cinder) to drop messages on the same messaging host as used by Ceilometer. From there the listener picks it up and cycles through the plugin logic to convert it to a sample. It's apparent that we can't pass an alternate hostname via the control_exchange values, so is there another method for harvesting messages off of other instances (e.g. another compute node)? Hey Phil, You don't really need to specify the exchange name to consume notifications. It will default to the control-exchange if not specified anyway. How it works isn't so obvious. Depending on the priority of then notification the oslo notifier will publish on topic.priority using the service's control-exchange. If that queue doesn't exist it'll create it and bind the control-exchange to it. This is so we can publish even if there are no consumers yet. I think the common default is notifications.info, yes? Oslo.rpc creates a 1:1 mapping of routing_key and queue to topic (no wildcards). So we get exchange:service - binding: routing_key topic.priority - queue topic.priority (essentially, 1 queue per priority) Which is why, if you want to enable services to generate notifications, you just have to set the driver and the topic(s) to publish on. Exchange is implied and routing key/queue are inferred from topic. Yep, following up to this point: Oslo takes care of the setup of exchanges on behalf of the services. When, say, Glance wants to push notifications onto the message bus, they can set the control_exchange value and the driver (rabbit, for example) and voila! An exchange is set up with a default queue bound to the key. Correct. Likewise we only have to specify the queue name to consume, since we only need an exchange to publish. Here's where my gap is: the notification plugins seem to assume that Ceilometer is sitting on the same messaging node/endpoint as the service. The config file allows us to specify the exchange names for the services , but not endpoints, so if Glance is publishing to notifications.info on rabbit.glance.hpcloud.net, and ceilometer is publishing/consuming from the rabbit.ceil.hpcloud.net node then the Glance notifications won't be collected. Hmm, I think I see your point. All the rabbit endpoints are determined by these switches: https://github.com/openstack/nova/blob/master/etc/nova/nova.conf.sample#L1532-L1592 We will need a way in CM to pull from multiple rabbits. I took another look at the Ceilometer config options...rabbit_hosts takes multiple hosts (i.e. rabbit.glance.hpcloud.net:, rabbit.ceil.hpcloud.net:) but it's not clear whether that's for publishing, collection, or both? The impl_kombu module does cycle through that list to create the connection pool, but it's not clear to me how it all comes together in the plugin instantiation... Nice catch. I'll have a look at that as well. Regardless, I think CM should have separate switches for each collector we run and break out the consume rabbit from the service rabbit. I may be in a position to work on this shortly if that's needed. I have a bare-bones oslo notifier consumer and client here if you want to mess around with it (and a bare-bones kombu version in the parent). Will take a look! https://github.com/SandyWalsh/amqp_sandbox/tree/master/oslo Not sure if that answered your question or made it worse? :) Cheers -S - Phil ___ OpenStack- dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Does openstack have a notification system that will let us know when a server changes state ?
Notifications work great. Actually StackTach has a web interface where you can watch the notifications coming through in real-time. We're slowing trying to get Ceilometer to have this functionality. StackTach works with Nova and Glance events currently. https://github.com/rackerlabs/stacktach Here is a video of how to use it (and the cmdline interface to it Stacky) http://www.sandywalsh.com/2012/10/debugging-openstack-with-stacktach-and.html And, if you're a roll-your-own kinda guy, I have a bare-bones Olso-based notifier service here you can look at to see how it works: https://github.com/SandyWalsh/amqp_sandbox Feel free to reach out if you have any other questions about it. Notifications are awesome. -S From: openstack learner [openstacklea...@gmail.com] Sent: Friday, October 18, 2013 3:56 PM To: openst...@lists.openstack.org; openstack-dev@lists.openstack.org Subject: [openstack-dev] Does openstack have a notification system that will let us know when a server changes state ? Hi all, I am using the openstack python api. After I boot an instance, I will keep polling the instance status to check if its status changes from BUILD to ACTIVE. My question is: does openstack have a notification system that will let us know when a vm changes state (e.g. goes into ACTIVE state)? then we won't have to keep on polling it when we need to know the change of the machine state. Thanks xin ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Nova][neutron][Heat][Oslo][Ceilometer][Havana]Single Subscription Point for event notification
Here's the current adoption of notifications in OpenStack ... hope it helps! http://www.sandywalsh.com/2013/09/notification-usage-in-openstack-report.html -S From: Qing He [qing...@radisys.com] Sent: Monday, October 28, 2013 8:48 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [TripleO][Nova][neutron][Heat][Oslo][Ceilometer][Havana]Single Subscription Point for event notification Thanks Angus! Yes, if this rpc notification mechanism works for all other components, e.g., Neutron, in addition to Nova, which seems to be the only documented component working with this notification system. For example, can we do something like Network.instance.shutdown/.end Or Storage.instance.shutdown/.end Or Image.instance.shutdown/.end ... -Original Message- From: Angus Salkeld [mailto:asalk...@redhat.com] Sent: Monday, October 28, 2013 4:36 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [TripleO][Nova][neutron][Heat][Oslo][Ceilometer][Havana]Single Subscription Point for event notification On 28/10/13 22:30 +, Qing He wrote: All, I found multiple places/components you can get event alarms, e.g., Heat, Ceilometer, Oslo, Nova etc, notification. But I fail to find any documents as to how to do it in the respective component documents. I 'm wondering if there is document as to if there is a single API entry point where you can subscribe and get event notification from all components, such as Nova, Neutron. Hi, If you are talking about rpc notifications, then this is one wiki page I know about: https://wiki.openstack.org/wiki/SystemUsageData (I have just added some heat notifications to it). -Angus Thanks, Qing ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer] Suggestions for alarm improvements ...
Hey y'all, Here are a few notes I put together around some ideas for alarm improvements. In order to set it up I spent a little time talking about the Ceilometer architecture in general, including some of the things we have planned for IceHouse. I think Parts 1-3 will be useful to anyone looking into Ceilometer. Part 4 is where the meat of it is. https://wiki.openstack.org/wiki/Ceilometer/AlarmImprovements Look forward to feedback from everyone and chatting about it at the summit. If I missed something obvious, please mark it up so we can address it. -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey
On 10/30/2013 03:10 PM, Steven Dake wrote: I will -2 any patch that adds zookeeper as a dependency to Heat. Certainly any distributed locking solution should be plugin based and optional. Just as a database-oriented solution could be the default plugin. Re: the Java issue, we already have optional components in other languages. I know Java is a different league of pain, but if it's an optional component and left as a choice of the deployer, should we care? -S PS As an aside, what are your issues with ZK? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey
Doh, sorry, left out the important part I had originally intended. The ZK unit tests could be split to not run by default, but if you're a ZK shop ... run them yourself. They might not be included in the gerrit tests, but should be the nature with heavy-weight drivers. We need to do more of this test splitting in general anyway. -S On 10/30/2013 04:20 PM, Sandy Walsh wrote: On 10/30/2013 03:10 PM, Steven Dake wrote: I will -2 any patch that adds zookeeper as a dependency to Heat. Certainly any distributed locking solution should be plugin based and optional. Just as a database-oriented solution could be the default plugin. Re: the Java issue, we already have optional components in other languages. I know Java is a different league of pain, but if it's an optional component and left as a choice of the deployer, should we care? -S PS As an aside, what are your issues with ZK? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey
On 10/30/2013 04:44 PM, Robert Collins wrote: On 31 October 2013 08:37, Sandy Walsh sandy.wa...@rackspace.com wrote: Doh, sorry, left out the important part I had originally intended. The ZK unit tests could be split to not run by default, but if you're a ZK shop ... run them yourself. They might not be included in the gerrit tests, but should be the nature with heavy-weight drivers. We need to do more of this test splitting in general anyway. Yes... but. We need to aim at production. If ZK is going to be the production sane way of doing it with the reference OpenStack code base, then we absolutely have to have our functional and integration tests run with ZK. Unit tests shouldn't be talking to a live ZK anyhow, so they don't concern me. Totally agree at the functional/integration test level. My concern was having to bring ZK into a dev env. We've already set the precedent with Erlang (rabbitmq). There are HBase (Java) drivers out there and Torpedo tests against a variety of other databases. I think the horse has left the barn. -Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey
On 10/30/2013 08:08 PM, Steven Dake wrote: On 10/30/2013 12:20 PM, Sandy Walsh wrote: On 10/30/2013 03:10 PM, Steven Dake wrote: I will -2 any patch that adds zookeeper as a dependency to Heat. Certainly any distributed locking solution should be plugin based and optional. Just as a database-oriented solution could be the default plugin. Sandy, Even if it is optional, some percentage of the userbase will enable it and expect the Heat community to debug and support it. But, that's the nature of every openstack project. I don't support HyperV in Nova or HBase in Ceilometer. The implementers deal with that support. I can help guide someone to those people but have no intentions of standing up those environments. Re: the Java issue, we already have optional components in other languages. I know Java is a different league of pain, but if it's an optional component and left as a choice of the deployer, should we care? -S PS As an aside, what are your issues with ZK? I realize zookeeper exists for a reason. But unfortunately Zookeeper is a server, rather then an in-process library. This means someone needs to figure out how to document, scale, secure, and provide high availability for this component. Yes, that's why we would use it. Same goes for rabbit and mysql. This is extremely challenging for the two server infrastructure components OpenStack server processes depend on today (AMQP, SQL). If the entire OpenStack community saw value in biting the bullet and accepting zookeeper as a dependency and taking on this work, I might be more ameniable. Why do other services need to agree on adopting ZK? If some Heat users need it, they can use it. Nova shouldn't care. What we are talking about in the review, however, is that the Heat team bite that bullet, which is a big addition to the scope of work we already execute for the ability to gain a distributed lock. I would expect there are simpler approaches to solve the problem without dragging the baggage of a new server component into the OpenStack deployment. Yes, there probably are, and alternatives are good. But, as others have attested, ZK is tried and true. Why not support it also? Using zookeeper as is suggested in the review is far different then the way Nova uses Zookeeper. With the Nova use case, Nova still operates just dandy without zookeeper. With zookeeper in the Heat usecase, it essentially becomes the default way people are expected to deploy Heat. Why, if it's a plugin? What I would prefer is taskflow over AMQP, to leverage existing server infrastructure (that has already been documented, scaled, secured, and HA-ified). Same problem exists, we're just pushing the ZK decision to another service. Regards -steve ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey
On 10/31/2013 11:43 AM, Monty Taylor wrote: Yes. I'm strongly opposed to ZooKeeper finding its way into the already complex pile of things we use. Monty, is that just because the stack is very complicated now, or something personal against ZK (or Java specifically)? Curious. -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] When is it okay for submitters to say 'I don't want to add tests' ?
On 10/30/2013 11:37 PM, Robert Collins wrote: This is a bit of a social norms thread I've been consistently asking for tests in reviews for a while now, and I get the occasional push-back. I think this falls into a few broad camps: A - there is no test suite at all, adding one in unreasonable B - this thing cannot be tested in this context (e.g. functional tests are defined in a different tree) C - this particular thing is very hard to test D - testing this won't offer benefit E - other things like this in the project don't have tests F - submitter doesn't know how to write tests G - submitter doesn't have time to write tests Now, of these, I think it's fine not add tests in cases A, B, C in combination with D, and D. I don't think E, F or G are sufficient reasons to merge something without tests, when reviewers are asking for them. G in the special case that the project really wants the patch landed - but then I'd expect reviewers to not ask for tests or to volunteer that they might be optional. Now, if I'm wrong, and folk have different norms about when to accept 'reason X not to write tests' as a response from the submitter - please let me know! I've done a lot of thinking around this topic [1][2] and really it comes down to this: everything can be tested and should be. There is an argument to A, but that goes beyond the scope of our use case I think. If I hear B, I would suspect the tests aren't unit tests, but are functional/integration tests (a common problem in OpenStack). Functional tests are brittle and usually have painful setup sequences. The other cases fall into the -1 camp for me. Tests required. That said, recently I was -1'ed for not updating a test, because I added code that didn't change the program flow, but introduced a new call. According to my rules, that didn't need a test, but I agreed with the logic that people would be upset if the call wasn't made (it was a notification). So a test was added. Totally valid argument. TL;DR: Tests are always required. We need to fix our tests to be proper unit tests and not functional/integration tests so it's easy to add new ones. -S [1] http://www.sandywalsh.com/2011/06/effective-units-tests-and-integration.html [2] http://www.sandywalsh.com/2011/08/pain-of-unit-tests-and-dynamically.html -Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Oslo] Improving oslo-incubator update.py
Seeing this thread reminded me: We need support in the update script for entry points in olso setup.cfg to make their way into the target project. So, if update is getting some love, please keep that in mind. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Adding notifications to Horizon
+1 on the inline method. It makes it clear when a notification should be emitted and, as you say, handles the exception handling better. Also, if it makes sense for Horizon, consider bracketing long-running operations in .start/.end pairs. This will help with performance tuning and early error detection. More info on well behaved notifications in here: http://www.sandywalsh.com/2013/09/notification-usage-in-openstack-report.html Great to see! -S On 11/25/2013 11:58 AM, Florent Flament wrote: Hi, I am interested in adding AMQP notifications to the Horizon dashboard, as described in the following blueprint: https://blueprints.launchpad.net/horizon/+spec/horizon-notifications There are currently several implementations in Openstack. While Nova and Cinder define `notify_about_*` methods that are called whenever a notification has to be sent, Keystone uses decorators, which send appropriate notifications when decorated methods are called. I fed the blueprint's whiteboard with an implementation proposal, based on Nova and Cinder implementation. I would be interested in having your opinion about which method would fit best, and whether these notifications make sense at all. Cheers, Florent Flament ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer] storage driver testing
Hey! We've ballparked that we need to store a million events per day. To that end, we're flip-flopping between sql and no-sql solutions, hybrid solutions that include elastic search and other schemes. Seems every road we go down has some limitations. So, we've started working on test suite for load testing the ceilometer storage drivers. The intent is to have a common place to record our findings and compare with the efforts of others. There's an etherpad where we're tracking our results [1] and a test suite that we're building out [2]. The test suite works against a fork of ceilometer where we can keep our experimental storage driver tweaks [3]. The test suite hits the storage drivers directly, bypassing the api, but still uses the ceilometer models. We've added support for dumping the results to statsd/graphite for charting of performance results in real-time. If you're interested in large scale deployments of ceilometer, we would welcome any assistance. Thanks! -Sandy [1] https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing [2] https://github.com/rackerlabs/ceilometer-load-tests [3] https://github.com/rackerlabs/instrumented-ceilometer ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] storage driver testing
On 11/29/2013 11:41 AM, Julien Danjou wrote: On Fri, Nov 29 2013, Nadya Privalova wrote: I'm very interested in performance results for Ceilometer. Now we have successfully installed Ceilometer in the HA-lab with 200 computes and 3 controllers. Now it works pretty good with MySQL. Our next steps are: What I'd like to know in both your and Sandy's tests, is the number of collector you are running in parallel. For our purposes we aren't interested in the collector. We're purely testing the performance of the storage drivers and the underlying databases. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] storage driver testing
On 11/29/2013 11:32 AM, Nadya Privalova wrote: Hello Sandy, I'm very interested in performance results for Ceilometer. Now we have successfully installed Ceilometer in the HA-lab with 200 computes and 3 controllers. Now it works pretty good with MySQL. Our next steps are: 1. Configure alarms 2. Try to use Rally for OpenStack performance with MySQL and MongoDB (https://wiki.openstack.org/wiki/Rally) We are open to any suggestions. Awesome, as a group we really need to start a similar effort as the storage driver tests for ceilometer in general. I assume you're just pulling Samples via the agent? We're really just focused on event storage and retrieval. There seems to be three levels of load testing required: 1. testing through the collectors (either sample or event collection) 2. testing load on the CM api 3. testing the storage drivers. Sounds like you're addressing #1, we're addressing #3 and Tempest integration tests will be handling #2. I should also add that we've instrumented the db and ceilometer hosts using Diamond to statsd/graphite for tracking load on the hosts while the tests are underway. This will help with determining how many collectors we need, where the bottlenecks are coming from, etc. It might be nice to standardize on that so we can compare results? -S Thanks, Nadya On Wed, Nov 27, 2013 at 9:42 PM, Sandy Walsh sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com wrote: Hey! We've ballparked that we need to store a million events per day. To that end, we're flip-flopping between sql and no-sql solutions, hybrid solutions that include elastic search and other schemes. Seems every road we go down has some limitations. So, we've started working on test suite for load testing the ceilometer storage drivers. The intent is to have a common place to record our findings and compare with the efforts of others. There's an etherpad where we're tracking our results [1] and a test suite that we're building out [2]. The test suite works against a fork of ceilometer where we can keep our experimental storage driver tweaks [3]. The test suite hits the storage drivers directly, bypassing the api, but still uses the ceilometer models. We've added support for dumping the results to statsd/graphite for charting of performance results in real-time. If you're interested in large scale deployments of ceilometer, we would welcome any assistance. Thanks! -Sandy [1] https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing [2] https://github.com/rackerlabs/ceilometer-load-tests [3] https://github.com/rackerlabs/instrumented-ceilometer ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] maintenance policy for code graduating from the incubator
So, as I mention in the branch, what about deployments that haven't transitioned to the library but would like to cherry pick this feature? after it starts moving into a library can leave a very big gap when the functionality isn't available to users. -S From: Eric Windisch [e...@cloudscaling.com] Sent: Friday, November 29, 2013 2:47 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [oslo] maintenance policy for code graduating from the incubator Based on that, I would like to say that we do not add new features to incubated code after it starts moving into a library, and only provide stable-like bug fix support until integrated projects are moved over to the graduated library (although even that is up for discussion). After all integrated projects that use the code are using the library instead of the incubator, we can delete the module(s) from the incubator. +1 Although never formalized, this is how I had expected we would handle the graduation process. It is also how we have been responding to patches and blueprints offerings improvements and feature requests for oslo.messaging. -- Regards, Eric Windisch ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] maintenance policy for code graduating from the incubator
On 11/29/2013 03:58 PM, Doug Hellmann wrote: On Fri, Nov 29, 2013 at 2:14 PM, Sandy Walsh sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com wrote: So, as I mention in the branch, what about deployments that haven't transitioned to the library but would like to cherry pick this feature? after it starts moving into a library can leave a very big gap when the functionality isn't available to users. Are those deployments tracking trunk or a stable branch? Because IIUC, we don't add features like this to stable branches for the main components, either, and if they are tracking trunk then they will get the new feature when it ships in a project that uses it. Are you suggesting something in between? Tracking trunk. If the messaging branch has already landed in Nova, then this is a moot discussion. Otherwise we'll still need it in incubator. That said, consider if messaging wasn't in nova trunk. According to this policy the new functionality would have to wait until it was. And, as we've seen with messaging, that was a very long time. That doesn't seem reasonable. Doug -S From: Eric Windisch [e...@cloudscaling.com mailto:e...@cloudscaling.com] Sent: Friday, November 29, 2013 2:47 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [oslo] maintenance policy for code graduating from the incubator Based on that, I would like to say that we do not add new features to incubated code after it starts moving into a library, and only provide stable-like bug fix support until integrated projects are moved over to the graduated library (although even that is up for discussion). After all integrated projects that use the code are using the library instead of the incubator, we can delete the module(s) from the incubator. +1 Although never formalized, this is how I had expected we would handle the graduation process. It is also how we have been responding to patches and blueprints offerings improvements and feature requests for oslo.messaging. -- Regards, Eric Windisch ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] maintenance policy for code graduating from the incubator
On 12/01/2013 06:40 PM, Doug Hellmann wrote: On Sat, Nov 30, 2013 at 3:52 PM, Sandy Walsh sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com wrote: On 11/29/2013 03:58 PM, Doug Hellmann wrote: On Fri, Nov 29, 2013 at 2:14 PM, Sandy Walsh sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com wrote: So, as I mention in the branch, what about deployments that haven't transitioned to the library but would like to cherry pick this feature? after it starts moving into a library can leave a very big gap when the functionality isn't available to users. Are those deployments tracking trunk or a stable branch? Because IIUC, we don't add features like this to stable branches for the main components, either, and if they are tracking trunk then they will get the new feature when it ships in a project that uses it. Are you suggesting something in between? Tracking trunk. If the messaging branch has already landed in Nova, then this is a moot discussion. Otherwise we'll still need it in incubator. That said, consider if messaging wasn't in nova trunk. According to this policy the new functionality would have to wait until it was. And, as we've seen with messaging, that was a very long time. That doesn't seem reasonable. The alternative is feature drift between the incubated version of rpc and oslo.messaging, which makes the task of moving the other projects to messaging even *harder*. What I'm proposing seems like a standard deprecation/backport policy; I'm not sure why you see the situation as different. Sandy, can you elaborate on how you would expect to maintain feature parity between the incubator and library while projects are in transition? Deprecation usually assumes there is something in place to replace the old way. If I'm reading this correctly, you're proposing we stop adding to the existing library as soon as the new library has started? Shipping code always wins out. We can't stop development simply based on the promise that something new is on the way. Leaving the existing code to bug fix only status is far too limiting. In the case of messaging this would have meant an entire release cycle with no new features in oslo.rpc. Until the new code replaces the old, we have to suffer the pain of updating both codebases. Doug Doug -S From: Eric Windisch [e...@cloudscaling.com mailto:e...@cloudscaling.com mailto:e...@cloudscaling.com mailto:e...@cloudscaling.com] Sent: Friday, November 29, 2013 2:47 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [oslo] maintenance policy for code graduating from the incubator Based on that, I would like to say that we do not add new features to incubated code after it starts moving into a library, and only provide stable-like bug fix support until integrated projects are moved over to the graduated library (although even that is up for discussion). After all integrated projects that use the code are using the library instead of the incubator, we can delete the module(s) from the incubator. +1 Although never formalized, this is how I had expected we would handle the graduation process. It is also how we have been responding to patches and blueprints offerings improvements and feature requests for oslo.messaging. -- Regards, Eric Windisch ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Unified Guest Agent proposal
On 12/06/2013 03:45 PM, Dmitry Mescheryakov wrote: Hello all, We would like to push further the discussion on unified guest agent. You may find the details of our proposal at [1]. Also let me clarify why we started this conversation. Savanna currently utilizes SSH to install/configure Hadoop on VMs. We were happy with that approach until recently we realized that in many OpenStack deployments VMs are not accessible from controller. That brought us to idea to use guest agent for VM configuration instead. That approach is already used by Trove, Murano and Heat and we can do the same. Uniting the efforts on a single guest agent brings a couple advantages: 1. Code reuse across several projects. 2. Simplified deployment of OpenStack. Guest agent requires additional facilities for transport like message queue or something similar. Sharing agent means projects can share transport/config and hence ease life of deployers. We see it is a library and we think that Oslo is a good place for it. Naturally, since this is going to be a _unified_ agent we seek input from all interested parties. It might be worth while to consider building from the Rackspace guest agents for linux [2] and windows [3]. Perhaps get them moved over to stackforge and scrubbed? These are geared towards Xen, but that would be a good first step in making the HV-Guest pipe configurable. [2] https://github.com/rackerlabs/openstack-guest-agents-unix [3] https://github.com/rackerlabs/openstack-guest-agents-windows-xenserver -S [1] https://wiki.openstack.org/wiki/UnifiedGuestAgent Thanks, Dmitry ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [taskflow] Recommendations for the granularity of tasks and their stickiness to workers
On 6/17/2014 7:04 AM, Eoghan Glynn wrote: Folks, A question for the taskflow ninjas. Any thoughts on best practice WRT $subject? Specifically I have in mind this ceilometer review[1] which adopts the approach of using very fine-grained tasks (at the level of an individual alarm evaluation) combined with short-term assignments to individual workers. But I'm also thinking of future potential usage of taskflow within ceilometer, to support partitioning of work over a scaled-out array of central agents. Does taskflow also naturally support a model whereby more chunky tasks (possibly including ongoing periodic work) are assigned to workers in a stickier fashion, such that re-balancing of workload can easily be triggered when a change is detected in the pool of available workers? I don't think taskflow today is really focused on load balancing of tasks. Something like gearman [1] might be better suited in the near term? My understanding is that taskflow is really focused on in-process tasks (with retry, restart, etc) and later will support distributed tasks. But my data could be stale too. (jharlow?) Even still, the decision of smaller tasks vs. chunky ones really comes down to how much work you want to re-do if there is a failure. I've seen some uses of taskflow where the breakdown of tasks seemed artificially small. Meaning, the overhead of going back to the library on an undo/rewind is greater than the undo itself. -S [1] http://gearman.org/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Why is there a 'None' task_state between 'SCHEDULING' 'BLOCK_DEVICE_MAPPING'?
Nice ... that's always bugged me. From: wu jiang [win...@gmail.com] Sent: Thursday, June 26, 2014 9:30 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [nova] Why is there a 'None' task_state between 'SCHEDULING' 'BLOCK_DEVICE_MAPPING'? Hi Phil, Ok, I'll submit a patch to add a new task_state(like 'STARTING_BUILD') in these two days. And related modifications will be definitely added in the Doc. Thanks for your help. :) WingWJ On Thu, Jun 26, 2014 at 6:42 PM, Day, Phil philip@hp.commailto:philip@hp.com wrote: Why do others think – do we want a spec to add an additional task_state value that will be set in a well defined place. Kind of feels overkill for me in terms of the review effort that would take compared to just reviewing the code - its not as there are going to be lots of alternatives to consider here. From: wu jiang [mailto:win...@gmail.commailto:win...@gmail.com] Sent: 26 June 2014 09:19 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [nova] Why is there a 'None' task_state between 'SCHEDULING' 'BLOCK_DEVICE_MAPPING'? Hi Phil, thanks for your reply. So should I need to submit a patch/spec to add it now? On Wed, Jun 25, 2014 at 5:53 PM, Day, Phil philip@hp.commailto:philip@hp.com wrote: Looking at this a bit deeper the comment in _start_buidling() says that its doing this to “Save the host and launched_on fields and log appropriately “. But as far as I can see those don’t actually get set until the claim is made against the resource tracker a bit later in the process, so this whole update might just be not needed – although I still like the idea of a state to show that the request has been taken off the queue by the compute manager. From: Day, Phil Sent: 25 June 2014 10:35 To: OpenStack Development Mailing List Subject: RE: [openstack-dev] [nova] Why is there a 'None' task_state between 'SCHEDULING' 'BLOCK_DEVICE_MAPPING'? Hi WingWJ, I agree that we shouldn’t have a task state of None while an operation is in progress. I’m pretty sure back in the day this didn’t use to be the case and task_state stayed as Scheduling until it went to Networking (now of course networking and BDM happen in parallel, so you have to be very quick to see the Networking state). Personally I would like to see the extra granularity of knowing that a request has been started on the compute manager (and knowing that the request was started rather than is still sitting on the queue makes the decision to put it into an error state when the manager is re-started more robust). Maybe a task state of “STARTING_BUILD” for this case ? BTW I don’t think _start_building() is called anymore now that we’ve switched to conductor calling build_and_run_instance() – but the same task_state issue exist in there well. From: wu jiang [mailto:win...@gmail.com] Sent: 25 June 2014 08:19 To: OpenStack Development Mailing List Subject: [openstack-dev] [nova] Why is there a 'None' task_state between 'SCHEDULING' 'BLOCK_DEVICE_MAPPING'? Hi all, Recently, some of my instances were stuck in task_state 'None' during VM creation in my environment. So I checked found there's a 'None' task_state between 'SCHEDULING' 'BLOCK_DEVICE_MAPPING'. The related codes are implemented like this: #def _start_building(): #self._instance_update(context, instance['uuid'], # vm_state=vm_states.BUILDING, # task_state=None, # expected_task_state=(task_states.SCHEDULING, # None)) So if compute node is rebooted after that procession, all building VMs on it will always stay in 'None' task_state. And it's useless and not convenient for locating problems. Why not a new task_state for this step? WingWJ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo][messaging] Further improvements and refactoring
Something to consider is the create the queue in advance feature is done for notifications, so we don't drop important messages on the floor by having an Exchange with no associated Queue. For RPC operations, this may not be required (we assume the service is available). If this check is truly a time-sink we could ignore that check for rpc calls. -S On 6/10/2014 9:31 AM, Alexei Kornienko wrote: Hi, Please find some answers inline. Regards, Alexei On 06/10/2014 03:06 PM, Flavio Percoco wrote: On 10/06/14 15:03 +0400, Dina Belova wrote: Hello, stackers! Oslo.messaging is future of how different OpenStack components communicate with each other, and really I’d love to start discussion about how we can make this library even better then it’s now and how can we refactor it make more production-ready. As we all remember, oslo.messaging was initially inspired to be created as a logical continuation of nova.rpc - as a separated library, with lots of transports supported, etc. That’s why oslo.messaging inherited not only advantages of now did the nova.rpc work (and it were lots of them), but also some architectural decisions that currently sometimes lead to the performance issues (we met some of them while Ceilometer performance testing [1] during the Icehouse). For instance, simple testing messaging server (with connection pool and eventlet) can process 700 messages per second. The same functionality implemented using plain kombu (without connection pool and eventlet) driver is processing ten times more - 7000-8000 messages per second. So we have the following suggestions about how we may make this process better and quicker (and really I’d love to collect your feedback, folks): 1) Currently we have main loop running in the Executor class, and I guess it’ll be much better to move it to the Server class, as it’ll make relationship between the classes easier and will leave Executor only one task - process the message and that’s it (in blocking or eventlet mode). Moreover, this will make further refactoring much easier. To some extent, the executors are part of the server class since the later is the one actually controlling them. If I understood your proposal, the server class would implement the event loop, which means we would have an EventletServer / BlockingServer, right? If what I said is what you meant, then I disagree. Executors keep the eventloop isolated from other parts of the library and this is really important for us. One of the reason is to easily support multiple python versions - by having different event loops. Is my assumption correct? Could you elaborate more? No It's not how we plan it. Server will do the loop and pass received message to dispatcher and executor. It means that we would still have blocking executor and eventlet executor in the same server class. We would just change the implementation part to make it more consistent and easier to control. 2) Some of the drivers implementations (such as impl_rabbit and impl_qpid, for instance) are full of useless separated classes that in reality might be included to other ones. There are already some changes making the whole structure easier [2], and after the 1st issue will be solved Dispatcher and Listener also will be able to be refactored. This was done on purpose. The idea was to focus on backwards compatibility rather than cleaning up/improving the drivers. That said, sounds like those drivers could user some clean up. However, I think we should first extend the test suite a bit more before hacking the existing drivers. 3) If we’ll separate RPC functionality and messaging functionality it’ll make code base clean and easily reused. What do you mean with this? We mean that current drivers are written with RPC code hardcoded inside (ReplyWaiter, etc.). Thats not how messaging library is supposed to work. We can move RPC to a separate layer and this would be beneficial for both rpc (code will become more clean and less error-prone) and core messaging part (we'll be able to implement messaging in way that will work much faster). 4) Connection pool can be refactored to implement more efficient connection reusage. Please, elaborate. What changes do you envision? Currently there is a class that is called ConnectionContext that is used to manage pool. Additionaly it can be accessed/configured in several other places. If we refactor it a little bit it would be much easier to use connections from the pool. As Dims suggested, I think filing some specs for this (and keeping the proposals separate) would help a lot in understanding what the exact plan is. Glad to know you're looking forward to help improving oslo.messaging. Thanks, Flavio Folks, are you ok with such a plan? Alexey Kornienko already started some of this work [2], but really we want to be sure that we chose the correct vector of development here. Thanks! [1] https://docs.google.com/document/d/
Re: [openstack-dev] [oslo][messaging] Further improvements and refactoring
On 6/27/2014 11:27 AM, Alexei Kornienko wrote: Hi, Why should we create queue in advance? Notifications are used for communicating with downstream systems (which may or may not be online at the time). This includes dashboards, monitoring systems, billing systems, etc. They can't afford to lose these important updates. So, a queue has to exist and the events just build-up until they are eaten. RPC doesn't need this though. Let's consider following use cases: 1) * listener starts and creates a queue * publishers connect to exchange and start publishing No need to create a queue in advance here since listener does it when it starts Right, this is the RPC case. 2) * publishers create a queue in advance and start publishing Creation is not correct since there is no guarantee that someone would ever use this queue... This is why notifications are turned off by default. IMHO listener should create a queue and publishers should not care about it at all. What do you think? See above. There are definite use-cases where the queue has to be created in advance. But, as I say, RPC isn't one of them. So, for 90% of the AMQP traffic, we don't need this feature. We should be able to disable it for RPC in oslo.messaging. (I say should because I'm not positive some aspect of openstack doesn't depend on the queue existing. Thinking about the scheduler mostly) -S On 06/27/2014 05:16 PM, Sandy Walsh wrote: Something to consider is the create the queue in advance feature is done for notifications, so we don't drop important messages on the floor by having an Exchange with no associated Queue. For RPC operations, this may not be required (we assume the service is available). If this check is truly a time-sink we could ignore that check for rpc calls. -S On 6/10/2014 9:31 AM, Alexei Kornienko wrote: Hi, Please find some answers inline. Regards, Alexei On 06/10/2014 03:06 PM, Flavio Percoco wrote: On 10/06/14 15:03 +0400, Dina Belova wrote: Hello, stackers! Oslo.messaging is future of how different OpenStack components communicate with each other, and really I’d love to start discussion about how we can make this library even better then it’s now and how can we refactor it make more production-ready. As we all remember, oslo.messaging was initially inspired to be created as a logical continuation of nova.rpc - as a separated library, with lots of transports supported, etc. That’s why oslo.messaging inherited not only advantages of now did the nova.rpc work (and it were lots of them), but also some architectural decisions that currently sometimes lead to the performance issues (we met some of them while Ceilometer performance testing [1] during the Icehouse). For instance, simple testing messaging server (with connection pool and eventlet) can process 700 messages per second. The same functionality implemented using plain kombu (without connection pool and eventlet) driver is processing ten times more - 7000-8000 messages per second. So we have the following suggestions about how we may make this process better and quicker (and really I’d love to collect your feedback, folks): 1) Currently we have main loop running in the Executor class, and I guess it’ll be much better to move it to the Server class, as it’ll make relationship between the classes easier and will leave Executor only one task - process the message and that’s it (in blocking or eventlet mode). Moreover, this will make further refactoring much easier. To some extent, the executors are part of the server class since the later is the one actually controlling them. If I understood your proposal, the server class would implement the event loop, which means we would have an EventletServer / BlockingServer, right? If what I said is what you meant, then I disagree. Executors keep the eventloop isolated from other parts of the library and this is really important for us. One of the reason is to easily support multiple python versions - by having different event loops. Is my assumption correct? Could you elaborate more? No It's not how we plan it. Server will do the loop and pass received message to dispatcher and executor. It means that we would still have blocking executor and eventlet executor in the same server class. We would just change the implementation part to make it more consistent and easier to control. 2) Some of the drivers implementations (such as impl_rabbit and impl_qpid, for instance) are full of useless separated classes that in reality might be included to other ones. There are already some changes making the whole structure easier [2], and after the 1st issue will be solved Dispatcher and Listener also will be able to be refactored. This was done on purpose. The idea was to focus on backwards compatibility rather than cleaning up/improving the drivers. That said, sounds like those drivers could user some clean up. However, I think we should first extend the test suite a bit more before
Re: [openstack-dev] [oslo] Openstack and SQLAlchemy
woot! From: Mike Bayer [mba...@redhat.com] Sent: Monday, June 30, 2014 1:56 PM To: OpenStack Development Mailing List (not for usage questions) Subject: [openstack-dev] [oslo] Openstack and SQLAlchemy Hi all - For those who don't know me, I'm Mike Bayer, creator/maintainer of SQLAlchemy, Alembic migrations and Dogpile caching. In the past month I've become a full time Openstack developer working for Red Hat, given the task of carrying Openstack's database integration story forward. To that extent I am focused on the oslo.db project which going forward will serve as the basis for database patterns used by other Openstack applications. I've summarized what I've learned from the community over the past month in a wiki entry at: https://wiki.openstack.org/wiki/Openstack_and_SQLAlchemy The page also refers to an ORM performance proof of concept which you can see at https://github.com/zzzeek/nova_poc. The goal of this wiki page is to publish to the community what's come up for me so far, to get additional information and comments, and finally to help me narrow down the areas in which the community would most benefit by my contributions. I'd like to get a discussion going here, on the wiki, on IRC (where I am on freenode with the nickname zzzeek) with the goal of solidifying the blueprints, issues, and SQLAlchemy / Alembic features I'll be focusing on as well as recruiting contributors to help in all those areas. I would welcome contributors on the SQLAlchemy / Alembic projects directly as well, as we have many areas that are directly applicable to Openstack. I'd like to thank Red Hat and the Openstack community for welcoming me on board and I'm looking forward to digging in more deeply in the coming months! - mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On 7/10/2014 5:52 AM, Eoghan Glynn wrote: TL;DR: do we need to stabilize notifications behind a versioned and discoverable contract? Thanks for dusting this off. Versioning and published schemas for notifications are important to the StackTach team. It would be nice to get this resolved. We're happy to help out. Folks, One of the issues that has been raised in the recent discussions with the QA team about branchless Tempest relates to some legacy defects in the OpenStack notification system. Now, I don't personally subscribe to the PoV that ceilometer, or indeed any other consumer of these notifications (e.g. StackTach), was at fault for going ahead and depending on this pre-existing mechanism without first fixing it. But be that as it may, we have a shortcoming here that needs to be called out explicitly, and possible solutions explored. In many ways it's akin to the un-versioned RPC that existed in nova before the versioned-rpc-apis BP[1] was landed back in Folsom IIRC, except that notification consumers tend to be at arms-length from the producer, and the effect of a notification is generally more advisory than actionable. A great outcome would include some or all of the following: 1. more complete in-tree test coverage of notification logic on the producer side Ultimately this is the core problem. A breaking change in the notifications caused tests to fail in other systems. Should we be adding more tests or simply add version checking at the lower levels (like the first pass of RPC versioning did)? (more on this below) 2. versioned notification payloads to protect consumers from breaking changes in payload format Yep, like RPC the biggies are: 1. removal of fields from notifications 2. change in semantics of a particular field 3. addition of new fields (not a biggie) The urgency for notifications is a little different than RPC where there is a method on the other end expecting a certain format. Notifications consumers have to be a little more forgiving when things don't come in as expected. This isn't a justification for breaking changes. Just stating that we have some leeway. I guess it really comes down to systems that are using notifications for critical synchronization vs. purely informational. 3. external discoverability of which event types a service is emitting These questions can be saved for later, but ... Is the use-case that a downstream system can learn which queue to subscribe to programmatically? Is this a nice-to-have? Would / should this belong in a metadata service? 4. external discoverability of which event types a service is consuming Isn't this what the topic queues are for? Consumers should only subscribe to the topics they're interested in. If you're thinking that sounds like a substantial chunk of cross-project work co-ordination, you'd be right :) Perhaps notification schemas should be broken out into a separate repo(s)? That way we can test independent of the publishing system. For example, our notigen event simulator [5] could use it. These could just be dependent libraries/plugins to oslo.messaging. So the purpose of this thread is simply to get a read on the appetite in the community for such an effort. At the least it would require: * trashing out the details in say a cross-project-track session at the K* summit * buy-in from the producer-side projects (nova, glance, cinder etc.) in terms of stepping up to make the changes * acquiescence from non-integrated projects that currently consume these notifications (we shouldn't, as good citizens, simply pull the rug out from under projects such as StackTach without discussion upfront) We'll adapt StackTach.v2 accordingly. StackTach.v3 is far less impacted by notification changes since they are offloaded and processed in a secondary step. Breaking changes will just stall the processing. I suspect .v3 will be in place before .v2 is affected. Adding version handling to Stack-Distiller (our notification-event translator) should be pretty easy (and useful) [6] * dunno if the TC would need to give their imprimatur to such an approach, or whether we could simply self-organize and get it done without the need for governance resolutions etc. Any opinions on how desirable or necessary this is, and how the detailed mechanics might work, would be welcome. A published set of schemas would be very useful for StackTach, we'd love to help out in any way possible. In the near-term we have to press on under the assumption notification definitions are fragile. Apologies BTW if this has already been discussed and rejected as unworkable. I see a stalled versioned-notifications BP[2] and some references to the CADF versioning scheme in the LP fossil-record. Also an inconclusive ML thread from 2012[3], and a related grizzly summit design session[4], but it's unclear to me whether these aspirations got much traction in the
Re: [openstack-dev] [all] Treating notifications as a contract
On 7/10/2014 2:59 PM, Daniel Dyer wrote: From my perspective, the requirement is to be able to have a consistent and predictable format for notifications that are being sent from all services. This means: 1. a set of required fields that all events contain and have consistent meaning 2. a set of optional fields, you don’t have to include these but if you do then you follow the same format and meaning That is the design of notifications [7]. I guess we're debating the schema of the Payload section on a per-event basis. (as opposed to the somewhat loose definitions we have for those sections currently [8]) [7] https://wiki.openstack.org/wiki/NotificationSystem [8] https://wiki.openstack.org/wiki/SystemUsageData 3. versioning of events: version is updated whenever the required fields are changed. managing optional fields can be done via a specification Discovery of events would be interesting from an automated testing perspective, but I am not sure how effective this would be for an application actually consuming the event.s Not sure how you would use enumerating the consumption of events ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On 7/10/2014 12:10 PM, Chris Dent wrote: On Thu, 10 Jul 2014, Julien Danjou wrote: My initial plan was to leverage a library like voluptuous to do schema based validation on the sender side. That would allow for receiver to introspect schema and know the data structure to expect. I didn't think deeply on how to handle versioning, but that should be doable too. It's not clear to me in this discussion what it is that is being versioned, contracted or standardized. Is it each of the many different notifications that various services produce now? Is it the general concept of a notification which can be considered a sample that something like Ceilometer or StackTack might like to consume? The only real differences between a sample and an event are: 1. the size of the context. Host X CPU = 70% tells you nearly everything you need to know. But compute.scheduler.host_selected will require lots of information to tell you why and how host X was selected. The event payload should be atomic and not depend on previous events for context. With samples, the context is sort of implied by the key or queue name. 2. The handling of Samples can be sloppy. If you miss a CPU sample, just wait for the next one. But if you drop an Event, a billing report is going to be wrong or a dependent system loses sync. 3. There are a *lot* more samples emitted than events. Samples are a shotgun blast while events are registered mail. This is why samples don't usually have the schema problems of events. They are so tiny, there's not much to change. Putting a lot of metadata in a sample is generally a bad idea. Leave it to the queue or key name. That said, Monasca is doing some really cool stuff with high-speed sample processing such that the likelihood of dropping a sample is so low that event support should be able to come from the same framework. The difference is simply the size of the payload and if the system can handle it at volume (quickly and reliably). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On 7/15/2014 3:51 AM, Mark McLoughlin wrote: On Fri, 2014-07-11 at 10:04 +0100, Chris Dent wrote: On Fri, 11 Jul 2014, Lucas Alvares Gomes wrote: The data format that Ironic will send was part of the spec proposed and could have been reviewed. I think there's still time to change it tho, if you have a better format talk to Haomeng which is the guys responsible for that work in Ironic and see if he can change it (We can put up a following patch to fix the spec with the new format as well) . But we need to do this ASAP because we want to get it landed in Ironic soon. It was only after doing the work that I realized how it might be an example for the sake of this discussion. As the architecure of Ceilometer currently exist there still needs to be some measure of custom code, even if the notifications are as I described them. However, if we want to take this opportunity to move some of the smarts from Ceilomer into the Ironic code then the paste that I created might be a guide to make it possible: http://paste.openstack.org/show/86071/ So you're proposing that all payloads should contain something like: 'events': [ # one or more dicts with something like { # some kind of identifier for the type of event 'class': 'hardware.ipmi.temperature', 'type': '#thing that indicates threshold, discrete, cumulative', 'id': 'DIMM GH VR Temp (0x3b)', 'value': '26', 'unit': 'C', 'extra': { ... } } i.e. a class, type, id, value, unit and a space to put additional metadata. This looks like a particular schema for one event-type (let's say foo.sample). It's hard to extrapolate this one schema to a generic set of common metadata applicable to all events. Really the only common stuff we can agree on is the stuff already there: tenant, user, server, message_id, request_id, timestamp, event_type, etc. Side note on using notifications for sample data: 1. you should generate a proper notification when the rules of a sample change (limits, alarms, sources, etc) ... but no actual measurements. This would be something like a ironic.alarm-rule-change notification or something 2. you should generate a minimal event for the actual samples CPU-xxx: 70% that relates to the previous rule-changing notification. And do this on a queue something like foo.sample. This way, we can keep important notifications in a priority queue and handle them accordingly (since they hold important data), but let the samples get routed across less-reliable transports (like UDP) via the RoutingNotifier. Also, send the samples one-at-a-time and let them either a) drop on the floor (udp) or b) let the aggregator roll them up into something smaller (sliding window, etc). Making these large notifications contain a list of samples means we had to store state somewhere on the server until transmission time. Ideally something we wouldn't want to rely on. On the subject of notifications as a contract, calling the additional metadata field 'extra' suggests to me that there are no stability promises being made about those fields. Was that intentional? However on that however, if there's some chance that a large change could happen, it might be better to wait, I don't know. Unlikely that a larger change will be made in Juno - take small window of opportunity to rationalize Ironic's payload IMHO. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On 7/11/2014 6:08 AM, Chris Dent wrote: On Fri, 11 Jul 2014, Lucas Alvares Gomes wrote: The data format that Ironic will send was part of the spec proposed and could have been reviewed. I think there's still time to change it tho, if you have a better format talk to Haomeng which is the guys responsible for that work in Ironic and see if he can change it (We can put up a following patch to fix the spec with the new format as well) . But we need to do this ASAP because we want to get it landed in Ironic soon. It was only after doing the work that I realized how it might be an example for the sake of this discussion. As the architecure of Ceilometer currently exist there still needs to be some measure of custom code, even if the notifications are as I described them. However, if we want to take this opportunity to move some of the smarts from Ceilomer into the Ironic code then the paste that I created might be a guide to make it possible: http://paste.openstack.org/show/86071/ However on that however, if there's some chance that a large change could happen, it might be better to wait, I don't know. Just to give a sense of what we're dealing with, as while back I wrote a little script to dump the schema of all events StackTach collected from Nova. The value fields are replaced with types (or ? if it was a class object). http://paste.openstack.org/show/54140/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] Generate Event or Notification in Ceilometer
If all you want to do is publish a notification you can use oslo.messaging directly. Or, for something lighter weight, we have Notabene, which is a small wrapper on Kombu. An example of how our notification simulator/generator uses it is available here: https://github.com/StackTach/notigen/blob/master/bin/event_pump.py Of course, you'll have to ensure you fabricate a proper event payload. Hope it helps -S From: Duan, Li-Gong (Gary@HPServers-Core-OE-PSC) [li-gong.d...@hp.com] Sent: Tuesday, July 29, 2014 6:05 AM To: openstack-dev@lists.openstack.org Subject: [openstack-dev] [Ceilometer] Generate Event or Notification in Ceilometer Hi Folks, Are there any guide or examples to show how to produce a new event or notification add add a handler for this event in ceilometer? I am asked to implement OpenStack service monitoring which will send an event and trigger the handler once a service, say nova-compute, crashes, in a short time. :( The link (http://docs.openstack.org/developer/ceilometer/events.html) does a good job on the explanation of concept and hence I know that I need to emit notification to message queue and ceilometer-collector will process them and generate events but it is far from real implementations. Regards, Gary ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Payload within RabbitMQ messages for Nova related exchanges
On 04/15/2014 10:07 AM, George Monday wrote: Hey there, I've got a quick question about the RabbitMQ exchanges. We are writing listeners for the RabbitMQ exchanges. The basic information about the tasks like compute.instance.create.[start|stop] etc. as stored in the 'payload' attribute of the json message are my concern at the moment. Does this follow a certain predefined structure that's consistent for the lifetime of, say, a specific nova api version? Will this change in major releases (from havana to icehouse)? Is this subject to change without notice? Is there a definition available somewhere? Like for the api versions? In short, how reliable is the json structure of the payload attribute in a rabbitMQ message? We just want to make sure, that with an update to the OpenStack controller, we wouldn't break our listeners? Hey George, Most of the notifications are documented here https://wiki.openstack.org/wiki/SystemUsageData But, you're correct that there is no versioning on these currently, but there are some efforts to fix this (specifically around CADF-support) Here's some more info on notifications if you're interested: http://www.sandywalsh.com/2013/09/notification-usage-in-openstack-report.html Hope it helps! -S My Best, George ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] How to re-compile Devstack Code
Also, I find setting this in my localrc/local.conf helps debugging: # get an actual log file vs. screen scrollback LOGFILE=/opt/stack/logs/stack.sh.log # gimme all the info VERBOSE=True # don't pull from git every time I run stack.sh RECLONE=False # make the logs readable LOG_COLOR=False From: shiva m [anjane...@gmail.com] Sent: Thursday, April 24, 2014 2:42 AM To: openstack-dev@lists.openstack.org Subject: [openstack-dev] How to re-compile Devstack Code Hi, I have Devstack havana setup on Ubuntu 13.10. I am trying to modify some files in /opt/stack/* folder. How do I re-compile the Devstack to make my changes get effect. Does unstacking and stacking works?. I see unstacking and stacking installs everything fresh. Correct me if wrong. Thanks, Shiva ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Monitoring as a Service
On 5/6/2014 10:04 AM, Thierry Carrez wrote: John Dickinson wrote: One of the advantages of the program concept within OpenStack is that separate code projects with complementary goals can be managed under the same program without needing to be the same codebase. The most obvious example across every program are the server and client projects under most programs. This may be something that can be used here, if it doesn't make sense to extend the ceilometer codebase itself. +1 Being under the Telemetry umbrella lets you make the right technical decision between same or separate codebase, as both would be supported by the organizational structure. It also would likely give you an initial set of contributors interested in the same end goals. So at this point I'd focus on engaging with the Telemetry program folks and see if they would be interested in that capability (inside or outside of the Ceilometer code base). This is interesting. I'd be curious to know more what managed means in this situation? Is the core project expected to allocate time in the IRC meeting to the concerns of these adjacent projects? What if the core project doesn't agree with the direction or deems there's too much overlap? Does the core team instantly have sway over the adjacent project? Or does it simply mean we tag ML discussions with [Telemetry] and people can filter accordingly? I mean, this all sounds good in theory, but I'd like to know more about the practical implementation of it. Related client and server projects seem like the low-hanging fruit. -S Cheers, ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Monitoring as a Service
On 5/6/2014 1:48 PM, Thierry Carrez wrote: Sandy Walsh wrote: I'd be curious to know more what managed means in this situation? Is the core project expected to allocate time in the IRC meeting to the concerns of these adjacent projects? What if the core project doesn't agree with the direction or deems there's too much overlap? Does the core team instantly have sway over the adjacent project? It has to be basically the same team of people working on the two projects. The goals and the direction of the project are shared. There is no way it can work if you consider some core and some adjacent, that would quickly create a us vs. them mentality and not work out that well in reviews. Of course, there can be contributors that are not interested in one project or another. But if you end up with completely-separated subteams, then there is little value in living under the same umbrella and sharing a core team. Ok, that's what I thought. Thanks for the clarification. -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract
From: Chris Dent [chd...@redhat.com] Tuesday, October 07, 2014 12:07 PM On Wed, 3 Sep 2014, Sandy Walsh wrote: Good goals. When Producer and Consumer know what to expect, things are good ... I know to find the Instance ID here. When the consumer wants to deal with a notification as a generic object, things get tricky (find the instance ID in the payload, What is the image type?, Is this an error notification?) Basically, how do we define the principle artifacts for each service and grant the consumer easy/consistent access to them? (like the 7-W's above) I'd really like to find a way to solve that problem. Is that a good summary? What did I leave out or get wrong? Great start! Let's keep it simple and do-able. Has there been any further thinking on these topics? Summit is soon and kilo specs are starting so I imagine more people than just me are hoping to get rolling on plans. If there is going to be a discussion at summit I hope people will be good about keeping some artifacts for those of us watching from afar. It seems to me that if the notifications ecosystem becomes sufficiently robust and resilient we ought to be able to achieve some interesting scale and distributed-ness opporunities throughout OpenStack, not just in telemetry/metering/eventing (choose your term of art). Haven't had any time to get anything written down (pressing deadlines with StackTach.v3) but open to suggestions. Perhaps we should just add something to the olso.messaging etherpad to find time at the summit to talk about it? -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract
From: Sandy Walsh [sandy.wa...@rackspace.com] Tuesday, October 07, 2014 6:07 PM Haven't had any time to get anything written down (pressing deadlines with StackTach.v3) but open to suggestions. Perhaps we should just add something to the olso.messaging etherpad to find time at the summit to talk about it? -S Actually, that's not really true. The Monasca team has been playing with schema definitions for their wire format (a variation on the kind of notification we ultimately want). And http://apiary.io/ is introducing support for structure schemas soon. Perhaps we can start with some schema proposals there? JSON-Schema based? For green-field installations, CADF is a possibility, but for already established services we will to document what's in place first. At some point we'll need a cross-project effort to identify all the important characteristics of the various services. Also, we've been finding no end of problems with the wild-west payload section. For example, look at all the different places we have to look to find the instance UUID from Nova. https://github.com/SandyWalsh/stacktach-sandbox/blob/verifier/winchester/event_definitions.yaml#L12-L17 Likewise for project_id, flavor_id, deleted_at, etc. Definitely need a solution to this. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract
From: Doug Hellmann [d...@doughellmann.com] Tuesday, October 14, 2014 7:19 PM It might be more appropriate to put it on the cross-project session list: https://etherpad.openstack.org/p/kilo-crossproject-summit-topics Done ... thanks! ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] request_id deprecation strategy question
Does this mean we're losing request-id's? Will they still appear in the Context objects? And there was the effort to keep consistent request-id's in cross-service requests, will this deprecation affect that? -S From: Steven Hardy [sha...@redhat.com] Sent: Monday, October 20, 2014 10:58 AM To: openstack-dev@lists.openstack.org Subject: [openstack-dev] [oslo] request_id deprecation strategy question Hi all, I have a question re the deprecation strategy for the request_id module, which was identified as a candidate for removal in Doug's recent message[1], as it's moved from oslo-incubator to oslo.middleware. The problem I see is that oslo-incubator deprecated this in Juno, but (AFAICS) all projects shipped Juno without the versionutils deprecation warning sync'd [2] Thus, we can't remove the local openstack.common.middleware.request_id, or operators upgrading from Juno to Kilo without changing their api-paste.ini files will experience breakage without any deprecation warning. I'm sure I've read and been told that all backwards incompatible config file changes require a deprecation period of at least one cycle, so does this mean all projects just sync the Juno oslo-incubator request_id into their kilo trees, leave it there until kilo releases, while simultaneously switching their API configs to point to oslo.middleware? Guidance on how to proceed would be great, if folks have thoughts on how best to handle this. Thanks! Steve [1] http://lists.openstack.org/pipermail/openstack-dev/2014-October/048303.html [2] https://github.com/openstack/oslo-incubator/blob/stable/juno/openstack/common/middleware/request_id.py#L33 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] request_id deprecation strategy question
Phew :) Thanks Steve. From: Steven Hardy [sha...@redhat.com] Sent: Monday, October 20, 2014 12:52 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [oslo] request_id deprecation strategy question On Mon, Oct 20, 2014 at 02:17:54PM +, Sandy Walsh wrote: Does this mean we're losing request-id's? No, it just means the implementation has moved from oslo-incubator[1] to oslo.middleware[2]. The issue I'm highlighting is that those projects using the code now have to update their api-paste.ini files to import from the new location, presumably while giving some warning to operators about the impending removal of the old code. All I'm seeking to clarify is the most operator sensitive way to handle this transition, given that we seem to have missed the boat on including a nice deprecation warning for Juno. Steve [1] https://github.com/openstack/oslo-incubator/blob/stable/juno/openstack/common/middleware/request_id.py#L33 [2] https://github.com/openstack/oslo.middleware/blob/master/oslo/middleware/request_id.py ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] How can we get more feedback from users?
Nice work Angus ... great idea. Would love to see more of this. -S From: Angus Salkeld [asalk...@mirantis.com] Sent: Friday, October 24, 2014 1:32 AM To: OpenStack Development Mailing List (not for usage questions) Subject: [openstack-dev] [all] How can we get more feedback from users? Hi all I have felt some grumblings about usability issues with Heat templates/client/etc.. and wanted a way that users could come and give us feedback easily (low barrier). I started an etherpad (https://etherpad.openstack.org/p/heat-useablity-improvements) - the first win is it is spelt wrong :-O We now have some great feedback there in a very short time, most of this we should be able to solve. This lead me to think, should OpenStack have a more general mechanism for users to provide feedback. The idea is this is not for bugs or support, but for users to express pain points, requests for features and docs/howtos. It's not easy to improve your software unless you are listening to your users. Ideas? -Angus ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] StackTach users?
Hey y'all! I'm taking a page from Angus and trying to pull together a list of StackTach users. We're moving quickly on our V3 implementation and I'd like to ensure we're addressing the problems you've faced/are facing with older versions. For example, I know initial setup has been a concern and we're starting with an ansible installer in V3. Would that help? We're also ditching the web gui (for now) and buffing up the REST API and client tools. Is that a bad thing? Feel free to contact me directly if you don't like the public forums. Or we can chat at the summit. Cheers! -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] Notifications as a contract summit prep
Thanks ... we'll be sure to address your concerns. And there's the list we've compiled here: https://etherpad.openstack.org/p/kilo-crossproject-summit-topics (section 4) -S From: Chris Dent [chd...@redhat.com] Sent: Friday, October 24, 2014 2:45 PM To: OpenStack-dev@lists.openstack.org Subject: [openstack-dev] [Ceilometer] Notifications as a contract summit prep Since I'm not going to be at summit and since I care about notifications I was asked to write down some thoughts prior to summit so my notions didn't get missed. The notes are at: https://tank.peermore.com/tanks/cdent-rhat/SummitNotifications TL;DR: make sure that adding new stuff (producers, consumers, notifications) is easy. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Summit Recap: Notification Schema
https://etherpad.openstack.org/p/kilo-crossproject-notifications The big takeaways: 1. We want the schema to be external so other languages can utilize them. 2. JSON-Schema seems fine, but AVRO has traction in the Big Data world and should be considered. 3. The challenge of have text-file based schema's is how to make them available for CI and deployments. Packaging problems. There is no simple pip install for text files. Talked about the possibility of making them available by the service API itself or exposing their location via a Service Catalog entry. 4. There are a lot of other services that need a solution to this problem. Monasca needs to define a message bus schema. Nova Objects has its own for RPC calls. It would be nice to solve this problem once. 5. The CADF group is very open to making changes to the spec to accommodate our needs. Regardless, we need a way to transform existing notifications to whatever the new format is. So, we not only need schema definition grammar, but we will need a transformation grammar out of the gate for backwards compatibility. 6. Like Nova Objects, it would be nice to make a single smart schema object that can read a schema file and become that object with proper setters and getters (and validation, version up-conversion/down-conversion, etc) 7. If we can nail down the schema grammar, the transformation grammar and perhaps the schema object in Kilo we can start to promote it for adoption in L-release. 8. People should be freed up to work on this around Kilo-2 (new year) Lots of other details in the etherpad. It would be good to arrange a meeting soon to discuss the schema grammar again. And how to distribute the schemas in test and prod env's. Perhaps come up with some concrete recommendations. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Where should Schema files live?
Hey y'all, To avoid cross-posting, please inform your -infra / -operations buddies about this post. We've just started thinking about where notification schema files should live and how they should be deployed. Kind of a tricky problem. We could really use your input on this problem ... The assumptions: 1. Schema files will be text files. They'll live in their own git repo (stackforge for now, ideally oslo eventually). 2. Unit tests will need access to these files for local dev 3. Gating tests will need access to these files for integration tests 4. Many different services are going to want to access these files during staging and production. 5. There are going to be many different versions of these files. There are going to be a lot of schema updates. Some problems / options: a. Unlike Python, there is no simple pip install for text files. No version control per se. Basically whatever we pull from the repo. The problem with a git clone is we need to tweak config files to point to a directory and that's a pain for gating tests and CD. Could we assume a symlink to some well-known location? a': I suppose we could make a python installer for them, but that's a pain for other language consumers. b. In production, each openstack service could expose the schema files via their REST API, but that doesn't help gating tests or unit tests. Also, this means every service will need to support exposing schema files. Big coordination problem. c. In production, We could add an endpoint to the Keystone Service Catalog to each schema file. This could come from a separate metadata-like service. Again, yet-another-service to deploy and make highly available. d. Should we make separate distro packages? Install to a well known location all the time? This would work for local dev and integration testing and we could fall back on B and C for production distribution. Of course, this will likely require people to add a new distro repo. Is that a concern? Personally, I'm leaning towards option D but I'm not sure what the implications are. We're early in thinking about these problems, but would like to start the conversation now to get your opinions. Look forward to your feedback. Thanks -Sandy ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Where should Schema files live?
From: Doug Hellmann [d...@doughellmann.com] Thursday, November 20, 2014 3:51 PM On Nov 20, 2014, at 8:12 AM, Sandy Walsh sandy.wa...@rackspace.com wrote: Hey y'all, To avoid cross-posting, please inform your -infra / -operations buddies about this post. We've just started thinking about where notification schema files should live and how they should be deployed. Kind of a tricky problem. We could really use your input on this problem ... The assumptions: 1. Schema files will be text files. They'll live in their own git repo (stackforge for now, ideally oslo eventually). Why wouldn’t they live in the repo of the application that generates the notification, like we do with the database schema and APIs defined by those apps? That would mean downstream consumers (potentially in different languages) would need to pull all repos and extract just the schema parts. A separate repo would make it more accessible. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Where should Schema files live?
From: Eoghan Glynn [egl...@redhat.com] Thursday, November 20, 2014 5:34 PM Some questions/observations inline. Hey y'all, To avoid cross-posting, please inform your -infra / -operations buddies about this post. We've just started thinking about where notification schema files should live and how they should be deployed. Kind of a tricky problem. We could really use your input on this problem ... The assumptions: 1. Schema files will be text files. They'll live in their own git repo (stackforge for now, ideally oslo eventually). 2. Unit tests will need access to these files for local dev 3. Gating tests will need access to these files for integration tests 4. Many different services are going to want to access these files during staging and production. 5. There are going to be many different versions of these files. There are going to be a lot of schema updates. Some problems / options: a. Unlike Python, there is no simple pip install for text files. No version control per se. Basically whatever we pull from the repo. The problem with a git clone is we need to tweak config files to point to a directory and that's a pain for gating tests and CD. Could we assume a symlink to some well-known location? a': I suppose we could make a python installer for them, but that's a pain for other language consumers. Would it be unfair to push that burden onto the writers of clients in other languages? i.e. OpenStack, being largely python-centric, would take responsibility for both: 1. Maintaining the text versions of the schema in-tree (e.g. as json) and: 2. Producing a python-specific installer based on #1 whereas, the first Java-based consumer of these schema would take #1 and package it up in their native format, i.e. as a jar or OSGi bundle. Certainly an option. My gut says it will lead to abandoned/fragmented efforts. If I was a ruby developer, would I want to take on the burden of maintaining yet another package? I think we need to treat this data as a form of API and there it's our responsibility to make easily consumable. (I'm not hard-line on this, again, just my gut feeling) b. In production, each openstack service could expose the schema files via their REST API, but that doesn't help gating tests or unit tests. Also, this means every service will need to support exposing schema files. Big coordination problem. I kind of liked this schemaURL endpoint idea when it was first mooted at summit. The attraction for me was that it would allow the consumer of the notifications always have access to the actual version of schema currently used on the emitter side, independent of the (possibly out-of-date) version of the schema that the consumer has itself installed locally via a static dependency. However IIRC there were also concerns expressed about the churn during some future rolling upgrades - i.e. if some instances of the nova-api schemaURL endpoint are still serving out the old schema, after others in the same deployment have already been updated to emit the new notification version. Yeah, I like this idea too. In the production / staging phase this seems like the best route. The local dev / testing situation seems to be the real tough nut to crack. WRT rolling upgrades we have to ensure we update the service catalog first, the rest should be fine. c. In production, We could add an endpoint to the Keystone Service Catalog to each schema file. This could come from a separate metadata-like service. Again, yet-another-service to deploy and make highly available. Also to {puppetize|chef|ansible|...}-ize. Yeah, agreed, we probably don't want to do down that road. Which is kinda unfortunate since it's the lowest impact on other projects. d. Should we make separate distro packages? Install to a well known location all the time? This would work for local dev and integration testing and we could fall back on B and C for production distribution. Of course, this will likely require people to add a new distro repo. Is that a concern? Quick clarification ... when you say distro packages, do you mean Linux-distro-specific package formats such as .rpm or .deb? Yep. Cheers, Eoghan Thanks for the feedback! ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Where should Schema files live?
From: Doug Hellmann [d...@doughellmann.com] Thursday, November 20, 2014 5:09 PM On Nov 20, 2014, at 3:40 PM, Sandy Walsh sandy.wa...@rackspace.com wrote: From: Doug Hellmann [d...@doughellmann.com] Thursday, November 20, 2014 3:51 PM On Nov 20, 2014, at 8:12 AM, Sandy Walsh sandy.wa...@rackspace.com wrote: The assumptions: 1. Schema files will be text files. They'll live in their own git repo (stackforge for now, ideally oslo eventually). Why wouldn’t they live in the repo of the application that generates the notification, like we do with the database schema and APIs defined by those apps? That would mean downstream consumers (potentially in different languages) would need to pull all repos and extract just the schema parts. A separate repo would make it more accessible. OK, fair. Could we address that by publishing the schemas for an app in a tar ball using a post merge job? That's something to consider. At first blush it feels a little clunky to pull all projects to extract schemas whenever any of the projects change. But there is something to be said about having the schema files next to the code that going to generate the data. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Where should Schema files live?
From: Eoghan Glynn [egl...@redhat.com] Friday, November 21, 2014 11:03 AM Some problems / options: a. Unlike Python, there is no simple pip install for text files. No version control per se. Basically whatever we pull from the repo. The problem with a git clone is we need to tweak config files to point to a directory and that's a pain for gating tests and CD. Could we assume a symlink to some well-known location? a': I suppose we could make a python installer for them, but that's a pain for other language consumers. Would it be unfair to push that burden onto the writers of clients in other languages? i.e. OpenStack, being largely python-centric, would take responsibility for both: 1. Maintaining the text versions of the schema in-tree (e.g. as json) and: 2. Producing a python-specific installer based on #1 whereas, the first Java-based consumer of these schema would take #1 and package it up in their native format, i.e. as a jar or OSGi bundle. I think Doug's suggestion of keeping the schema files in-tree and pushing them to a well-known tarball maker in a build step is best so far. It's still a little clunky, but not as clunky as having to sync two repos. [snip] d. Should we make separate distro packages? Install to a well known location all the time? This would work for local dev and integration testing and we could fall back on B and C for production distribution. Of course, this will likely require people to add a new distro repo. Is that a concern? Quick clarification ... when you say distro packages, do you mean Linux-distro-specific package formats such as .rpm or .deb? Yep. So that would indeed work, but just to sound a small note of caution that keeping an oft-changing package (assumption #5) up-to-date for fedora20/21 epel6/7, or precise/trusty, would involve some work. I don't know much about the Debian/Ubuntu packaging pipeline, in particular how it could be automated. But in my small experience of Fedora/EL packaging, the process is somewhat resistant to many fine-grained updates. Ah, good to know. So, if we go with the tarball approach, we should be able to avoid this. And it allows the service to easily service up the schema using their existing REST API. Should we proceed under the assumption we'll push to a tarball in a post-build step? It could change if we find it's too messy. -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] is there a way to simulate thousands or millions of compute nodes?
From: Michael Still [mi...@stillhq.com] Thursday, November 27, 2014 6:57 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [nova] is there a way to simulate thousands or millions of compute nodes? I would say that supporting millions of compute nodes is not a current priority for nova... We are actively working on improving support for thousands of compute nodes, but that is via cells (so each nova deploy except the top is still in the hundreds of nodes). ramble on Agreed, it wouldn't make much sense to simulate this on a single machine. That said, if one *was* to simulate this, there are the well known bottlenecks: 1. the API. How much can one node handle with given hardware specs? Which operations hit the DB the hardest? 2. the Scheduler. There's your API bottleneck and big load on the DB for Create operations. 3. the Conductor. Shouldn't be too bad, essentially just a proxy. 4. child-to-global-cell updates. Assuming a two-cell deployment. 5. the virt driver. YMMV. ... and that's excluding networking, volumes, etc. The virt driver should be load tested independently. So FakeDriver would be fine (with some delays added for common operations as Gareth suggests). Something like Bees-with-MachineGuns could be used to get a baseline metric for the API. Then it comes down to DB performance in the scheduler and conductor (for a single cell). Finally, inter-cell loads. Who blows out the queue first? All-in-all, I think you'd be better off load testing each piece independently on a fixed hardware platform and faking out all the incoming/outgoing services. Test the API with fake everything. Test the Scheduler with fake API calls and fake compute nodes. Test the conductor with fake compute nodes (not FakeDriver). Test the compute node directly. Probably all going to come down to the DB and I think there is some good performance data around that already? But I'm just spit-ballin' ... and I agree, not something I could see the Nova team taking on in the near term ;) -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Where should Schema files live?
From: Duncan Thomas [duncan.tho...@gmail.com] Sent: Sunday, November 30, 2014 5:40 AM To: OpenStack Development Mailing List Subject: Re: [openstack-dev] Where should Schema files live? Duncan Thomas On Nov 27, 2014 10:32 PM, Sandy Walsh sandy.wa...@rackspace.com wrote: We were thinking each service API would expose their schema via a new /schema resource (or something). Nova would expose its schema. Glance its own. etc. This would also work well for installations still using older deployments. This feels like externally exposing info that need not be external (since the notifications are not external to the deploy) and it sounds like it will potentially leak fine detailed version and maybe deployment config details that you don't want to make public - either for commercial reasons or to make targeted attacks harder Yep, good point. Makes a good case for standing up our own service or just relying on the tarballs being in a well know place. Thanks for the feedback. -S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Event Service
https://wiki.openstack.org/wiki/SystemUsageData Both Ceilometer and StackTach can be used to consume these notifications. https://github.com/openstack/ceilometer https://github.com/rackerlabs/stacktach (StackTach functionality is slowly being merged into Ceilometer) Hope it helps! -S From: Michael Still [mi...@stillhq.com] Sent: Friday, July 12, 2013 10:38 PM To: OpenStack Development Mailing List Subject: Re: [openstack-dev] Event Service OpenStack has a system called notifications which does what you're looking for. I've never used it, but I am sure its documented. Cheers, Michael On Sat, Jul 13, 2013 at 10:12 AM, Qing He qing...@radisys.com wrote: All, Does open stack have pub/sub event service? I would like to be notified of the event of VM creation/deletion/Migration etc. What is the best way to do this? Thanks, Qing ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Change I30b127d6] Cheetah vs Jinja
There's a ton of reviews/comparisons out there, only a google away. From: Doug Hellmann [doug.hellm...@dreamhost.com] Sent: Tuesday, July 16, 2013 1:45 PM To: OpenStack Development Mailing List Subject: Re: [openstack-dev] [Change I30b127d6] Cheetah vs Jinja Great, I think I had the Mako syntax mixed up with a different templating language that depended on having a DOM to work on. Can someone put together a more concrete analysis than this is working so we can compare the tools? :-) Doug On Tue, Jul 16, 2013 at 12:29 PM, Nachi Ueno na...@ntti3.commailto:na...@ntti3.com wrote: Hi Doug Mako looks OK for config generation This is code in review. https://review.openstack.org/#/c/33148/23/neutron/services/vpn/device_drivers/template/ipsec.conf.template 2013/7/16 Doug Hellmann doug.hellm...@dreamhost.commailto:doug.hellm...@dreamhost.com: On Tue, Jul 16, 2013 at 9:51 AM, Daniel P. Berrange berra...@redhat.commailto:berra...@redhat.com wrote: On Tue, Jul 16, 2013 at 09:41:55AM -0400, Solly Ross wrote: (This email is with regards to https://review.openstack.org/#/c/36316/) Hello All, I have been implementing the Guru Meditation Report blueprint (https://blueprints.launchpad.net/oslo/+spec/guru-meditation-report), and the question of a templating engine was raised. Currently, my version of the code includes the Jinja2 templating engine (http://jinja.pocoo.org/), which is modeled after the Django templating engine (it was designed to be an implementation of the Django templating engine without requiring the use of Django), which is used in Horizon. Apparently, the Cheetah templating engine (http://www.cheetahtemplate.org/) is used in a couple places in Nova. IMO, the Jinja template language produces much more readable templates, and I think is the better choice for inclusion in the Report framework. It also shares a common format with Django (making it slightly easier to write for people coming from that area), and is also similar to template engines for other languages. What does everyone else think? Repeating my comments from the review... I don't have an opinion on whether Jinja or Cheetah is a better choice, since I've essentially never used either of them (beyond deleting usage of ceetah from libvirt). I do, however, feel we should not needlessly use multiple different templating libraries across OpenStack. We should take care to standardize on one option that is suitable for all our needs. So if the consensus is that Jinja is better, then IMHO, there would need to be an blueprint + expected timeframe to port existing Ceetah usage to use Jinja. Regards, Daniel The most current release of Cheetah is from 2010. I don't have a problem adding a new dependency on a tool that is actively maintained, with a plan to migrate off of the older tool to come later. The Neutron team seems to want to use Mako (https://review.openstack.org/#/c/37177/). Maybe we should pick one? Keep in mind that we won't always be generating XML or HTML, so my first question is how well does Mako work for plain text? Doug -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Opinions needed: Changing method signature in RPC callback ...
Hey y'all! Running into an interesting little dilemma with a branch I'm working on. Recently, I introduced a branch in oslo-common to optionally .reject() a kombu message on an exception. Currently, we always .ack() all messages even if the processing callback fails. For Ceilometer, this is a problem ... we have to guarantee we get all notifications. The patch itself was pretty simple, but didn't work :) The spawn_n() call was eating the exceptions coming from the callback. So, in order to get the exceptions it's simple enough to re-wrap the callback, but I need to pool.waitall() after the spawn_n() to ensure none of the consumers failed. Sad, but a necessary evil. And remember, it's only used in a special case, normal openstack rpc is unaffected and remains async. But it does introduce a larger problem ... I have to change the rpc callback signature. Old: callback(message) New: callback(message, delivery_info=None, wait_for_consumers=False) (The delivery_info is another thing, we were dumping the message info on the floor, but this has important info in it) My worry is busting all the other callbacks out there that use olso-common.rpc Some options: 1. embed all these flags and extra data in the message structure message = {'_context_stuff': ..., 'payload: {...}, '_extra_magic': {...}} 2. make a generic CallContext() object to include with message that has anything else we need (a one-time signature break) call_context = CallContext({delivery_info: {...}, wait: False}) callback(message, call_context) 3. some other ugly python hack that I haven't thought of yet. Look forward to your thoughts on a solution! Thanks -S My work-in-progess is here: https://github.com/SandyWalsh/openstack-common/blob/callback_exceptions/openstack/common/rpc/amqp.py#L373 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Opinions needed: Changing method signature in RPC callback ...
On 07/18/2013 11:09 AM, Sandy Walsh wrote: 2. make a generic CallContext() object to include with message that has anything else we need (a one-time signature break) call_context = CallContext({delivery_info: {...}, wait: False}) callback(message, call_context) or just callback(message, **kwargs) of course. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Opinions needed: Changing method signature in RPC callback ...
On 07/18/2013 03:55 PM, Eric Windisch wrote: On Thu, Jul 18, 2013 at 10:09 AM, Sandy Walsh sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com wrote: My worry is busting all the other callbacks out there that use olso-common.rpc These callback methods are part of the Kombu driver (and maybe part of Qpid), but are NOT part of the RPC abstraction. These are private methods. They can be broken for external consumers of these methods, because there shouldn't be any. It will be a good lesson to anyone that tries to abuse private methods. I was wondering about that, but I assumed some parts of amqp.py were used by other transports as well (and not just impl_kombu.py) There are several callbacks in amqp.py that would be affected. -- Regards, Eric Windisch ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Opinions needed: Changing method signature in RPC callback ...
On 07/18/2013 05:56 PM, Eric Windisch wrote: These callback methods are part of the Kombu driver (and maybe part of Qpid), but are NOT part of the RPC abstraction. These are private methods. They can be broken for external consumers of these methods, because there shouldn't be any. It will be a good lesson to anyone that tries to abuse private methods. I was wondering about that, but I assumed some parts of amqp.py were used by other transports as well (and not just impl_kombu.py) There are several callbacks in amqp.py that would be affected. The code in amqp.py is used by the Kombu and Qpid drivers and might implement the public methods expected by the abstraction, but does not define it. The RPC abstraction is defined in __init__.py, and does not define callbacks. Other drivers, granted only being the ZeroMQ driver at present, are not expected to define a callback method and as a private method -- would have no template to follow nor an expectation to have this method. I'm not saying your proposed changes are bad or invalid, but there is no need to make concessions to the possibility that code outside of oslo would be using callback(). This opens up the option, besides creating a new method, to simply updating all the existing method calls that exist in amqp.py, impl_kombu.py, and impl_qpid.py. Gotcha ... thanks Eric. Yeah, the outer api is very generic. I did a little more research and, unfortunately, it seems the inner amqp implementations are being used by others. So I'll have to be careful with the callback signature. Ceilometer, for example, seems to be leaving zeromq support as an exercise for the reader. Perhaps oslo-messaging will make this abstraction easier to enforce. Cheers! -S -- Regards, Eric Windisch ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] New DB column or new DB table?
On 07/18/2013 11:12 PM, Lu, Lianhao wrote: Sean Dague wrote on 2013-07-18: On 07/17/2013 10:54 PM, Lu, Lianhao wrote: Hi fellows, Currently we're implementing the BP https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling. The main idea is to have an extensible plugin framework on nova-compute where every plugin can get different metrics(e.g. CPU utilization, memory cache utilization, network bandwidth, etc.) to store into the DB, and the nova-scheduler will use that data from DB for scheduling decision. Currently we adds a new table to store all the metric data and have nova-scheduler join loads the new table with the compute_nodes table to get all the data(https://review.openstack.org/35759). Someone is concerning about the performance penalty of the join load operation when there are many metrics data stored in the DB for every single compute node. Don suggested adding a new column in the current compute_nodes table in DB, and put all metric data into a dictionary key/value format and store the json encoded string of the dictionary into that new column in DB. I'm just wondering which way has less performance impact, join load with a new table with quite a lot of rows, or json encode/decode a dictionary with a lot of key/value pairs? Thanks, -Lianhao I'm really confused. Why are we talking about collecting host metrics in nova when we've got a whole project to do that in ceilometer? I think utilization based scheduling would be a great thing, but it really out to be interfacing with ceilometer to get that data. Storing it again in nova (or even worse collecting it a second time in nova) seems like the wrong direction. I think there was an equiv patch series at the end of Grizzly that was pushed out for the same reasons. If there is a reason ceilometer can't be used in this case, we should have that discussion here on the list. Because my initial reading of this blueprint and the code patches is that it partially duplicates ceilometer function, which we definitely don't want to do. Would be happy to be proved wrong on that. -Sean Using ceilometer as the source of those metrics was discussed in the nova-scheduler subgroup meeting. (see #topic extending data in host state in the following link). http://eavesdrop.openstack.org/meetings/scheduler/2013/scheduler.2013-04-30-15.04.log.html In that meeting, all agreed that ceilometer would be a great source of metrics for scheduler, but many of them don't want to make the ceilometer as a mandatory dependency for nova scheduler. This was also discussed at the Havana summit and rejected since we didn't want to introduce the external dependency of Ceilometer into Nova. That said, we already have hooks at the virt layer for collecting host metrics and we're talking about removing the pollsters from nova compute nodes if the data can be collected from these existing hooks. Whatever solution the scheduler group decides to use should utilize the existing (and maintained/growing) mechanisms we have in place there. That is, it should likely be a special notification driver that can get the data back to the scheduler in a timely fashion. It wouldn't have to use the rpc mechanism if it didn't want to, but it should be a plug-in at the notification layer. Please don't add yet another way of pulling metric data out of the hosts. -S Besides, currently ceilometer doesn't have host metrics, like the cpu/network/cache utilization data of the compute node host, which will affect the scheduling decision. What ceilometer has currently is the VM metrics, like cpu/network utilization of each VM instance. After the nova compute node collects the host metrics, those metrics could also be fed into ceilometer framework(e.g. through a ceilometer listener) for further processing, like alarming, etc. -Lianhao ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] New DB column or new DB table?
On 07/19/2013 09:43 AM, Sandy Walsh wrote: On 07/18/2013 11:12 PM, Lu, Lianhao wrote: Sean Dague wrote on 2013-07-18: On 07/17/2013 10:54 PM, Lu, Lianhao wrote: Hi fellows, Currently we're implementing the BP https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling. The main idea is to have an extensible plugin framework on nova-compute where every plugin can get different metrics(e.g. CPU utilization, memory cache utilization, network bandwidth, etc.) to store into the DB, and the nova-scheduler will use that data from DB for scheduling decision. Currently we adds a new table to store all the metric data and have nova-scheduler join loads the new table with the compute_nodes table to get all the data(https://review.openstack.org/35759). Someone is concerning about the performance penalty of the join load operation when there are many metrics data stored in the DB for every single compute node. Don suggested adding a new column in the current compute_nodes table in DB, and put all metric data into a dictionary key/value format and store the json encoded string of the dictionary into that new column in DB. I'm just wondering which way has less performance impact, join load with a new table with quite a lot of rows, or json encode/decode a dictionary with a lot of key/value pairs? Thanks, -Lianhao I'm really confused. Why are we talking about collecting host metrics in nova when we've got a whole project to do that in ceilometer? I think utilization based scheduling would be a great thing, but it really out to be interfacing with ceilometer to get that data. Storing it again in nova (or even worse collecting it a second time in nova) seems like the wrong direction. I think there was an equiv patch series at the end of Grizzly that was pushed out for the same reasons. If there is a reason ceilometer can't be used in this case, we should have that discussion here on the list. Because my initial reading of this blueprint and the code patches is that it partially duplicates ceilometer function, which we definitely don't want to do. Would be happy to be proved wrong on that. -Sean Using ceilometer as the source of those metrics was discussed in the nova-scheduler subgroup meeting. (see #topic extending data in host state in the following link). http://eavesdrop.openstack.org/meetings/scheduler/2013/scheduler.2013-04-30-15.04.log.html In that meeting, all agreed that ceilometer would be a great source of metrics for scheduler, but many of them don't want to make the ceilometer as a mandatory dependency for nova scheduler. This was also discussed at the Havana summit and rejected since we didn't want to introduce the external dependency of Ceilometer into Nova. That said, we already have hooks at the virt layer for collecting host metrics and we're talking about removing the pollsters from nova compute nodes if the data can be collected from these existing hooks. Whatever solution the scheduler group decides to use should utilize the existing (and maintained/growing) mechanisms we have in place there. That is, it should likely be a special notification driver that can get the data back to the scheduler in a timely fashion. It wouldn't have to use the rpc mechanism if it didn't want to, but it should be a plug-in at the notification layer. Please don't add yet another way of pulling metric data out of the hosts. -S I should also add, that if you go the notification route, that doesn't close the door on ceilometer integration. All you need is a means to get the data from the notification driver to the scheduler, that part could easily be replaced with a ceilometer driver if an operator wanted to go that route. The benefits of using Ceilometer would be having access to the downstream events/meters and generated statistics that could be produced there. We certainly don't want to add an advanced statistical package or event-stream manager to Nova, when Ceilometer already has aspirations of that. The out-of-the-box nova experience should be better scheduling when simple host metrics are used internally but really great scheduling when integrated with Ceilometer. Besides, currently ceilometer doesn't have host metrics, like the cpu/network/cache utilization data of the compute node host, which will affect the scheduling decision. What ceilometer has currently is the VM metrics, like cpu/network utilization of each VM instance. After the nova compute node collects the host metrics, those metrics could also be fed into ceilometer framework(e.g. through a ceilometer listener) for further processing, like alarming, etc. -Lianhao ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list
Re: [openstack-dev] [Nova] Ceilometer vs. Nova internal metrics collector for scheduling
On 07/19/2013 12:30 PM, Sean Dague wrote: On 07/19/2013 10:37 AM, Murray, Paul (HP Cloud Services) wrote: If we agree that something like capabilities should go through Nova, what do you suggest should be done with the change that sparked this debate: https://review.openstack.org/#/c/35760/ I would be happy to use it or a modified version. CPU sys, user, idle, iowait time isn't capabilities though. That's a dynamically changing value. I also think the current approach where this is point in time sampling, because we only keep a single value, is going to cause some oddly pathologic behavior if you try to use it as scheduling criteria. I'd really appreciate the views of more nova core folks on this thread, as it looks like these blueprints have seen pretty minimal code review at this point. H3 isn't that far away, and there is a lot of high priority things ahead of this, and only so much coffee and review time in a day. You really need to have a moving window average of these meters in order to have anything sensible. Also, some sort of view into the pipeline of scheduler requests (what's coming up?) Capabilities are only really used in the host filtering phase. The host weighing phase is where these measurements would be applied. -Sean ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] A simple way to improve nova scheduler
On 07/19/2013 05:01 PM, Boris Pavlovic wrote: Sandy, Hm I don't know that algorithm. But our approach doesn't have exponential exchange. I don't think that in 10k nodes cloud we will have a problems with 150 RPC call/sec. Even in 100k we will have only 1.5k RPC call/sec. More then (compute nodes update their state in DB through conductor which produce the same count of RPC calls). So I don't see any explosion here. Sorry, I was commenting on Soren's suggestion from way back (essentially listening on a separate exchange for each unique flavor ... so no scheduler was needed at all). It was a great idea, but fell apart rather quickly. The existing approach the scheduler takes is expensive (asking the db for state of all hosts) and polling the compute nodes might be do-able, but you're still going to have latency problems waiting for the responses (the states are invalid nearly immediately, especially if a fill-first scheduling algorithm is used). We ran into this problem before in an earlier scheduler implementation. The round-tripping kills. We have a lot of really great information on Host state in the form of notifications right now. I think having a service (or notification driver) listening for these and keeping an the HostState incrementally updated (and reported back to all of the schedulers via the fanout queue) would be a better approach. -S Best regards, Boris Pavlovic Mirantis Inc. On Fri, Jul 19, 2013 at 11:47 PM, Sandy Walsh sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com wrote: On 07/19/2013 04:25 PM, Brian Schott wrote: I think Soren suggested this way back in Cactus to use MQ for compute node state rather than database and it was a good idea then. The problem with that approach was the number of queues went exponential as soon as you went beyond simple flavors. Add Capabilities or other criteria and you get an explosion of exchanges to listen to. On Jul 19, 2013, at 10:52 AM, Boris Pavlovic bo...@pavlovic.me mailto:bo...@pavlovic.me mailto:bo...@pavlovic.me mailto:bo...@pavlovic.me wrote: Hi all, In Mirantis Alexey Ovtchinnikov and me are working on nova scheduler improvements. As far as we can see the problem, now scheduler has two major issues: 1) Scalability. Factors that contribute to bad scalability are these: *) Each compute node every periodic task interval (60 sec by default) updates resources state in DB. *) On every boot request scheduler has to fetch information about all compute nodes from DB. 2) Flexibility. Flexibility perishes due to problems with: *) Addiing new complex resources (such as big lists of complex objects e.g. required by PCI Passthrough https://review.openstack.org/#/c/34644/5/nova/db/sqlalchemy/models.py) *) Using different sources of data in Scheduler for example from cinder or ceilometer. (as required by Volume Affinity Filter https://review.openstack.org/#/c/29343/) We found a simple way to mitigate this issues by avoiding of DB usage for host state storage. A more detailed discussion of the problem state and one of a possible solution can be found here: https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit# Best regards, Boris Pavlovic Mirantis Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] A simple way to improve nova scheduler
On 07/19/2013 05:36 PM, Boris Pavlovic wrote: Sandy, I don't think that we have such problems here. Because scheduler doesn't pool compute_nodes. The situation is another compute_nodes notify scheduler about their state. (instead of updating their state in DB) So for example if scheduler send request to compute_node, compute_node is able to run rpc call to schedulers immediately (not after 60sec). So there is almost no races. There are races that occur between the eventlet request threads. This is why the scheduler has been switched to single threaded and we can only run one scheduler. This problem may have been eliminated with the work that Chris Behrens and Brian Elliott were doing, but I'm not sure. But certainly, the old approach of having the compute node broadcast status every N seconds is not suitable and was eliminated a long time ago. Best regards, Boris Pavlovic Mirantis Inc. On Sat, Jul 20, 2013 at 12:23 AM, Sandy Walsh sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com wrote: On 07/19/2013 05:01 PM, Boris Pavlovic wrote: Sandy, Hm I don't know that algorithm. But our approach doesn't have exponential exchange. I don't think that in 10k nodes cloud we will have a problems with 150 RPC call/sec. Even in 100k we will have only 1.5k RPC call/sec. More then (compute nodes update their state in DB through conductor which produce the same count of RPC calls). So I don't see any explosion here. Sorry, I was commenting on Soren's suggestion from way back (essentially listening on a separate exchange for each unique flavor ... so no scheduler was needed at all). It was a great idea, but fell apart rather quickly. The existing approach the scheduler takes is expensive (asking the db for state of all hosts) and polling the compute nodes might be do-able, but you're still going to have latency problems waiting for the responses (the states are invalid nearly immediately, especially if a fill-first scheduling algorithm is used). We ran into this problem before in an earlier scheduler implementation. The round-tripping kills. We have a lot of really great information on Host state in the form of notifications right now. I think having a service (or notification driver) listening for these and keeping an the HostState incrementally updated (and reported back to all of the schedulers via the fanout queue) would be a better approach. -S Best regards, Boris Pavlovic Mirantis Inc. On Fri, Jul 19, 2013 at 11:47 PM, Sandy Walsh sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com mailto:sandy.wa...@rackspace.com wrote: On 07/19/2013 04:25 PM, Brian Schott wrote: I think Soren suggested this way back in Cactus to use MQ for compute node state rather than database and it was a good idea then. The problem with that approach was the number of queues went exponential as soon as you went beyond simple flavors. Add Capabilities or other criteria and you get an explosion of exchanges to listen to. On Jul 19, 2013, at 10:52 AM, Boris Pavlovic bo...@pavlovic.me mailto:bo...@pavlovic.me mailto:bo...@pavlovic.me mailto:bo...@pavlovic.me mailto:bo...@pavlovic.me mailto:bo...@pavlovic.me mailto:bo...@pavlovic.me mailto:bo...@pavlovic.me wrote: Hi all, In Mirantis Alexey Ovtchinnikov and me are working on nova scheduler improvements. As far as we can see the problem, now scheduler has two major issues: 1) Scalability. Factors that contribute to bad scalability are these: *) Each compute node every periodic task interval (60 sec by default) updates resources state in DB. *) On every boot request scheduler has to fetch information about all compute nodes from DB. 2) Flexibility. Flexibility perishes due to problems with: *) Addiing new complex resources (such as big lists of complex objects e.g. required by PCI Passthrough https://review.openstack.org/#/c/34644/5/nova/db/sqlalchemy/models.py) *) Using different sources of data in Scheduler for example from cinder or ceilometer. (as required by Volume Affinity Filter https://review.openstack.org/#/c/29343/) We found a simple way to mitigate this issues by avoiding of DB usage for host state storage. A more detailed discussion of the problem state
Re: [openstack-dev] [Openstack] Ceilometer and notifications
On 08/01/2013 07:02 AM, Mark McLoughlin wrote: On Thu, 2013-08-01 at 10:36 +0200, Julien Danjou wrote: On Thu, Aug 01 2013, Sam Morrison wrote: OK so is it that ceilometer just leaves the message on the queue or only consumes certain messages? Ceilometer uses its own queue. There might be other processes consuming this notifications, so removing them may be not a good idea. The problem may be that the notification sender create a queue by default even if there's no consumer on that. Maybe that's something we should avoid doing in Oslo (Cc'ing -dev to get advice on that). I'm missing the context here, but it sounds like the default notifications queue created isn't the one consumed by ceilometer so it fills up and we just shouldn't be creating that queue. Sounds reasonable to me. Definitely file a bug for it. Hmm, if notifications are turned on, it should fill up. For billing purposes we don't want to lose events simply because there is no consumer. Operations would alert on it and someone would need to put out the fire. That's the reason we create the queue up front in the first place. Ideally, we could only write to the exchange, but we need the queue to ensure we don't lose any events. The CM Collector consumes from two queues: it's internal queue and the Nova queue (if configured). If CM is looking at the wrong nova queue by default, the bug would be over there. Cheers, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Weight normalization in scheduler
On 08/01/2013 04:24 AM, Álvaro López García wrote: Hi all. TL;DR: I've created a blueprint [1] regarding weight normalization. I would be very glad if somebody could examine and comment it. Something must have changed. It's been a while since I've done anything with the scheduler, but normalized weights is the way it was designed and implemented. The separate Weighing plug-ins are responsible for taking the specific units (cpu load, disk, ram, etc) and converting them into normalized 0.0-1.0 weights. Internally the plug-ins can work however they like, but their output should be 0-1. The multiplier, however, could scale this outside that range (if disk is more important than cpu, for example). Actually, I remember it being offset + scale * weight, so you could put certain factors in bands: cpu: 1000+, disk: 1+, etc. Hopefully offset is still there too? -S Recently I've been developing some weighers to be used within nova and I found that the weight system was using raw values. This makes difficult for an operator to stablish the importance of a weigher against the rest of them, since the values can range freely and one big magnitude returned by a weigher could shade another one. One solution is to inflate either the multiplier or the weight that is returned by the weigher, but this is an ugly hack (for example, if you increase the RAM on your systems, you will need to adjust the multipliers again). A much better approach is to use weight normalization before actually using the weights With weight normalization a weigher will still return a list of RAW values, but the BaseWeightHandler will normalize all of them into a range of values (0.0 and 1.0) before adding them up. This way, a weight for a given object will be: weight = w1_multiplier * norm(w1) + w2_multiplier * norm(w2) + ... This makes easier to stablish the importance of a weigher regarding the rest, by just adjusting the multiplier. This is explained in [1], and implemented in [2] (with some suggestions by the reviewers). [1] https://blueprints.launchpad.net/openstack/?searchtext=normalize-scheduler-weights [2] https://review.openstack.org/#/c/27160/ Thanks for your feedback, ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack] Ceilometer and notifications
On 08/01/2013 09:19 AM, Julien Danjou wrote: On Thu, Aug 01 2013, Sandy Walsh wrote: Hmm, if notifications are turned on, it should fill up. For billing purposes we don't want to lose events simply because there is no consumer. Operations would alert on it and someone would need to put out the fire. So currently, are we're possibly losing events because we don't use the standard queue but one defined by Ceilometer upon connection? We can't consume events from the default notifications queue or we would break any tool possibly using it. Each tool consuming needs a copy of them. Right, that is a concern. Within RAX we have two downstream services that consume notifications (StackTach and Yagi) and we've configured nova to write to two queues. --notifications_topics can take a list. Isn't there a way to queue the message in exchanges if there's no queue at all? I don't think so, but if that was possible it would solve our problem. AFAIK, amqp only uses the exchange as a dispatcher and all storage is done in the queue ... but I could be wrong. I vaguely recall there being a durable exchange setting as well as durable queue. I'll do some investigating. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev