Re: [openstack-dev] [Ironic][Ceilometer] Proposed Change to Sensor meter naming in Ceilometer
On Mon, 20 Oct 2014, Jim Mankovich wrote: I'll propose something via a spec to ceilometer for sensor naming which will include the ability to support the new health sensor information. Excellent. Do you happen to know what some of the use cases are for the current reporting of sensor information? Sadly, not really. I'm hoping some observers of this thread will chime in. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic][Ceilometer] Proposed Change to Sensor meter naming in Ceilometer
On Mon, 20 Oct 2014, Jim Mankovich wrote: On 10/20/2014 6:53 AM, Chris Dent wrote: On Fri, 17 Oct 2014, Jim Mankovich wrote: See answers inline. I don't have any concrete answers as to how to deal with some of questions you brought up, but I do have some more detail that may be useful to further the discussion. That seems like progress to me. And thanks for keeping it going some more. I'm going to skip your other (very useful) comments and go (almost) straight (below) to one thing which goes to the root of the queries I've been making. Most of the rest of what you said makes sense and we seem to be mostly in agreement. I suppose the next step would be propose a spec? https://github.com/openstack/ceilometer-specs We have 2 use cases, Get all the sensors within a given platform (based on ironic node id) Get all the sensors of a given "type/name". independent of platform Others? These are not use cases, these are tasks. That's because these say nothing about the thing you are actually trying to achieve. "Get all the sensors with a given platform" is a task without a purpose. You're not just going to stop there are you? If so why did you get the information in the first place. A use case could be: * I want to get all the sensors of a given platform so I can . Or even better something like: * I want to . And the way to do that would just so happen to be getting all the sensors. I realize this is perhaps pedantic hair-splitting, but I think it can be useful at least some of the time. I know that from my own experience I am very rarely able to get the Ceilometer API to give me the information that I actually want (e.g. "How many vcpus are currently in action). This feels like the result of data availability driving the query engine rather than vice versa. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic][Ceilometer] Proposed Change to Sensor meter naming in Ceilometer
On Fri, 17 Oct 2014, Jim Mankovich wrote: See answers inline. I don't have any concrete answers as to how to deal with some of questions you brought up, but I do have some more detail that may be useful to further the discussion. That seems like progress to me. Personally, I would like to see the _(0x##) removed form the Sensor ID string (by the ipmitool driver) before it returns sensors to the Ironic conductor. I just don't see any value in this extra info. This 0x## addition only helps if a vendor used the exact same Sensor ID string for multiple sensors of the same sensor type. i.e. Multiple sensors of type "Temperature", each with the exact same Sensor ID string of "CPU" instead of giving each Sensor ID string a unique name like "CPU 1 ", " CPU 2",... Is it worthwhile metadata to save, even if it isn't in the meter name? In a heterogeneous platform environment, the Sensor ID string is likely going to be different per vendor, so your question "If temperate...on any system board...on any hardware, notify the authorities" is going to be tough because each vendor may name their "system board" differently. But, I bet that vendors use similar strings, so worst case, your alarm creation could require 1 alarm definition per vendor. The alarm defintion I want to make is (as an operator not as a dev): "My puter's too hot, hlp!" Making that easy is the proper (to me) endpoint of a conversation about how to name meters. I see generic naming as somewhat problematic. If you lump all the temperature sensors for a platform under hardware.temperature the consumer will always need to query for a specific temperature sensor that it is interested in, like "system board". The notion of having different samples from multiple sensors under a single generic name seems harder to deal with to me. If you have multiple temperature samples under the same generic meter name, how do you figure out what all the possible temperature samples actual exist? I'm not suggestion all temperate sensors under one name ("hardware.temperature"), but all sensors which identify as the same thing (e.g. "hardware.temperature.system_board") under the same name. I'm not very informed about IMPI or hardware sensors, but I do have some experiencing in using names and identifiers (don't we all!) and I find that far too often we name things based on where they come from rather than how we wish to address them after genesis. Throughout ceilometer I think there are tons of opportunities to improve the naming of meters and as a result improve the UI for people who want to do things with the data. So from my perspective, with regard to naming IPMI (and other hardware sensor) related samples, I think we need to make a better list of the use cases which the samples need to satisfy and use that to drive a naming scheme. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] Ceilometer-Alarm-Not Working
On Mon, 20 Oct 2014, david jhon wrote: 2014-10-20 16:33:07.854 30437 TRACE ceilometer.alarm.service CommunicationError: Error communicating with http://193.168.4.121:8777 [Errno 111]$ 2014-10-20 16:33:07.854 30437 TRACE ceilometer.alarm.service How do I fix it? It looks like it may be that either your ceilometer-api service is not running or is not bound to the correct network interface. Since you've got a ceilometer-api.log is may be that the service is unreachable because of some configuration setting (firewall, alarm and api service on different networks, that sort of thing). -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic][Ceilometer] Proposed Change to Sensor meter naming in Ceilometer
On Thu, 16 Oct 2014, Jim Mankovich wrote: What I would like to propose is dropping the ipmi string from the name altogether and appending the Sensor ID to the name instead of to the Resource ID. So, transforming the above to the new naming would result in the following: | Name | Type | Unit | Resource ID | hardware.current.power_meter_(0x16) | gauge | W| edafe6f4-5996-4df8-bc84-7d92439e15c0 | hardware.temperature.system_board_(0x15) | gauge | C| edafe6f4-5996-4df8-bc84-7d92439e15c0 [plus sensor_provider in resource_metadata] If this makes sense for the kinds of queries that need to happen then we may as well do it, but I'm not sure it is. When I was writing the consumer code for the notifications the names of the meters was a big open question that was hard to resolve because of insufficient data and input on what people really need to do with the samples. The scenario you've listed is getting all sensors on a given single platform. What about the scenario where you want to create an alarm that says "If temperate gets over X on any system board on any of my hardware, notify the authorities"? Will having the "_(0x##)" qualifier allow that to work? I don't actually know, are those qualifiers standard in some way or are they specific to different equipment? If they are different having them in the meter name makes creating a useful alarm in a heterogeneous a bit more of a struggle, doesn't it? Perhaps (if they are not standard) this would work: | hardware.current.power_meter | gauge | W | edafe6f4-5996-4df8-bc84-7d92439e15c0 with both sensor_provider and whatever that qualifier is called in the metadata? Then the name remains sufficiently generic to allow aggregates across multiple systems, while still having the necessary info to narrow to different sensors of the same type. I understand that this proposed change is not backward compatible with the existing naming, but I don't really see a good solution that would retain backward compatibility. I think we should strive to worry less about such things, especially when it's just names in data fields. Not always possible, or even a good idea, but sometimes its a win. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] add cyclomatic complexity check to pep8 target
On Fri, 17 Oct 2014, Daniel P. Berrange wrote: IMHO this tool is of pretty dubious value. I mean that function is long for sure, but it is by no means a serious problem in the Nova libvirt codebase. The stuff it complains about in the libvirt/config.py file is just incredibly stupid thing to highlight. I find a lot of the OpenStack code very hard to read. If it is very hard to read it is very hard to maintain, whether that means fix or improve. That said, the value I see in these kinds of tools is not specifically in preventing complexity, but in providing entry points for people who want to fix things. You don't know where to start (because you haven't yet got the insight or experience): run flake8 or pylint or some other tools, do what it tells you. In the process you will: * learn more about the code * probably find bugs * make an incremental improvement to something that needs it -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [kolla] on Dockerfile patterns
On Thu, 16 Oct 2014, Lars Kellogg-Stedman wrote: On Fri, Oct 17, 2014 at 12:44:50PM +1100, Angus Lees wrote: You just need to find the pid of a process in the container (perhaps using docker inspect to go from container name -> pid) and then: nsenter -t $pid -m -u -i -n -p -w Note also that the 1.3 release of Docker ("any day now") will sport a Yesterday: http://blog.docker.com/2014/10/docker-1-3-signed-images-process-injection-security-options-mac-shared-directories/ -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [kolla] on Dockerfile patterns
On Tue, 14 Oct 2014, Jay Pipes wrote: This means you now have to know the system administrative comments and setup for two operating systems ... or go find a Fedora20 image for mysql somewhere. For sake of conversation and devil's advocacy let me ask, in response to this paragraph, "why [do you] have to know [...]?" If you've got a docker container that is running mysql, IME that's _all_ it should be doing and any (post-setup) management you are doing of it should be happening by creating a new container and trashing the one that is not right, not manipulating the existing container. The operating system should be as close to invisible as you can get it. Everything in the Dockerfile should be about getting the service to a fully installe and configured state and it should close with the one command or entrypoint it is going to run/use, in the mysql case that's one of the various ways to get mysqld happening. If the goal is to use Docker and to have automation I think it would be better to automate the generation of suitably layered/hierarchical Dockerfiles (using whatever tool of choice), not having Dockerfiles which then use config tools to populate out the image. If such config tools are necessary in the container it is pretty much guaranteed that the container is being overburdened with too many responsibilities. [1] Is there an official MySQL docker image? I found 553 Dockerhub repositories for MySQL images... https://github.com/docker-library/mysql Docker-library appears to be the place for official things. By layered model, are you referring to the bottom layer being the Docker image and then upper layers being stuff managed by a configuration management system? I assume it's the layering afforded by union file systems: Makes building images based on other images cheap and fast. The cheapness and fastness is one of the reasons why expressive Dockerfiles[1] are important: each line is a separate checkpointed image. [1] That is, Dockerfiles which do the install and config work rather than calling on other stuff to do the work. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [kolla] on Dockerfile patterns
On Tue, 14 Oct 2014, Angus Lees wrote: 2. I think we should separate out "run the server" from "do once-off setup". Yes! Otherwise it feels like the entire point of using containers and dockerfiles is rather lost. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract
On Tue, 7 Oct 2014, Sandy Walsh wrote: Haven't had any time to get anything written down (pressing deadlines with StackTach.v3) but open to suggestions. Perhaps we should just add something to the olso.messaging etherpad to find time at the summit to talk about it? Have you got a link for that? Another topic that I think is at least somewhat related to the standardizing/contractualizing notifications topic is deprecating polling (to get metrics/samples). In the ceilometer side of the telemetry universe, if samples can't be gathered via notifications then somebody writes a polling plugin or agent and sticks it in the ceilometer tree where it is run as either an independent agent (c.f. the new ipmi-agent) or a plugin under the compute-agent or a plugin under the central-agent. This is problematic in a few ways (at least to me): * Those plugins distract from the potential leanness of a core ceilometer system. * The meters created by those plugins are produced for "ceilometer" rather than for "telemetry". Yes, of course you can re-publish the samples in all sorts of ways. * The services aren't owning the form and publication of information about themselves. There are solid arguments against each of these problems individually but as a set I find them saying "services should make more notifications" pretty loud and clear and obviously to make that work we need tidy notifications with good clean semantics. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [qa] In-tree functional test vision
On Mon, 25 Aug 2014, Joe Gordon wrote: On Mon, Aug 4, 2014 at 3:29 AM, Chris Dent wrote: For constraints: Will tempest be available as a stable library? Is using tempest (or other same library across all projects) a good or bad thing? Seems there's some disagreement on both of these. Yes, there is a separate thread on spinning out a tempest-lib (not sure on what final name will be yet) that functional tests can use. Although I think there is a lot to be done before needing the tempest-lib. What's the status of tempest-lib? Looking at the repo it appears that other things may be taking priority at the moment. As I said in notifications thread: With summit approaching and kilo open for business, now seems to be talking about what kinds of structure we want to apply to in-tree functional testing. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract
On Wed, 3 Sep 2014, Sandy Walsh wrote: Good goals. When Producer and Consumer know what to expect, things are good ... "I know to find the Instance ID ". When the consumer wants to deal with a notification as a generic object, things get tricky ("find the instance ID in the payload", "What is the image type?", "Is this an error notification?") Basically, how do we define the principle artifacts for each service and grant the consumer easy/consistent access to them? (like the 7-W's above) I'd really like to find a way to solve that problem. Is that a good summary? What did I leave out or get wrong? Great start! Let's keep it simple and do-able. Has there been any further thinking on these topics? Summit is soon and kilo specs are starting so I imagine more people than just me are hoping to get rolling on plans. If there is going to be a discussion at summit I hope people will be good about keeping some artifacts for those of us watching from afar. It seems to me that if the notifications ecosystem becomes sufficiently robust and resilient we ought to be able to achieve some interesting scale and distributed-ness opporunities throughout OpenStack, not just in telemetry/metering/eventing (choose your term of art). -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] What's a dependency (was Re: [all][tc] governance changes for "big tent"...) model
On Fri, 3 Oct 2014, Devananda van der Veen wrote: Nope. I am not making any value judgement whatsoever. I'm describing dependencies for minimally satisfying the intended purpose of a given project. For example, Nova's primary goal is not "emit telemetry", it is "scalable, on demand, self service access to compute resources" [1] So while I agree with the usefulness of being able to describe these technical dependencies for minimal satisfaction and agree that it is a useful tool for creating boundaries and compartments for testing, the reason I started the subthread is because I think this form of statement I'm describing [...] for [...] of a given _project_. is prejudicing a certain set of priorities and perspectives which over the long term are damaging to the health of the larger ecosystem (the big tent or whatever it is), especially in terms of satisfying people other than us haute dev types. It's pretty clear everyone's intentions are pretty much in the right and similar place, but there's some friction over language and details. The tribalism associated with "project" appears to contribute: * to getting people off track a bit * keeping us in technical solutions when what we need are both technical solutions and organizational/social solutions Presumably (I wasn't there to see it) the program/project distinction was an effort to overcome this, but it hasn't worked. Of course not, you don't gain much if you have people in a room with name A and all you do is put a new name on the room and don't change the people or the room. We need to do more this time around than change some names. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] What's a dependency (was Re: [all][tc] governance changes for "big tent"...) model
On Fri, 3 Oct 2014, Joe Gordon wrote: data is coming from here: https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml and the key is here: https://github.com/jogo/graphing-openstack Cool, thanks. Many of those services expect[1] to be able to send notifications (or be polled by) ceilometer[2]. We've got an ongoing thread about the need to contractualize notifications. Are those contracts (or the desire for them) a form of dependency? Should they be? So in the case of notifications, I think that is a Ceilometer CAN-USE Nova THROUGH notifications Your statement here is part of the reason I asked. I think it is possible to argue that the dependency has the opposite order: Nova might like to use Ceilometer to keep metrics via notifications or perhaps: Nova CAN-USE Ceilometer FOR telemetry THROUGH notifications and polling. This is perhaps not the strict technological representation of the dependency, but it represents the sort of pseudo-social relationships between projects: Nova desires for Ceilometer (or at least something doing telemetry) to exist. Ceilometer itself is^wshould be agnostic about what sort of metrics are coming its way. It should accept them, potentially transform them, store them, and make them available for later use (including immediately). It doesn't^wshouldn't really care if Nova exists or not. There are probably lots of other relationships of this form between other services, thus the question: Is a use-of-notifications something worth tracking? I would say yes. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] What's a dependency (was Re: [all][tc] governance changes for "big tent"...) model
On Fri, 3 Oct 2014, Joe Gordon wrote: * services that nothing depends on * services that don't depend on other services Latest graph: http://i.imgur.com/y8zmNIM.png I'm hesitant to open this can but it's just lying there waiting, wiggling like good bait, so: How are you defining dependency in that picture? For example: Many of those services expect[1] to be able to send notifications (or be polled by) ceilometer[2]. We've got an ongoing thread about the need to contractualize notifications. Are those contracts (or the desire for them) a form of dependency? Should they be? [1] It's not that it is a strict requirement but lots of people involved with the other projects contribute code to ceilometer or make changes in their own[3] project specifically to send info to ceilometer. [2] I'm not trying to defend ceilometer from slings here, just point out a good example, since it has _no_ arrows. [3] "their own", that's hateful, let's have less of that. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][tc] governance changes for "big tent" model
On Fri, 3 Oct 2014, Sean Dague wrote: OpenStack is enough parts that you can mix and match as much as you want, but much like the 600 config options in Nova, we really can't document every combination of things. People seem to talk about this flexibility as if it were a good thing. It's not. There's tyranny of choice all over OpenStack. Is that good for real people or just large players and our corporate hosts? -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][tc] governance changes for "big tent" model
On Fri, 3 Oct 2014, Anne Gentle wrote: I'm reading and reading and reading and my thoughts keep returning to, "we're optimizing only for dev." :) Yes, +many. In my reading it seems like we are trying to optimize the process for developers which is exactly the opposite of what we want to be doing if we want to address the perceived quality problems that we see. We should be optimizing for the various user groups (which, I admit, have been identified pretty well in some of the blog posts). This would, of course, mean enhancing the docs (and other cross project) process... At the moment we're trying to create governance structures that incrementally improve the existing model for how development is being done. I think we should consider more radical changes, changes which allow us to work on what users want: an OpenStack that works. To do that I think we need to figure out two things: * how to fail faster * how to stop thinking of ourselves as being on particular projects I got hired to work on telemetry, but I've managed to do most of my work in QA related things because what's the point of making new stuff if you can't test it reliably? What I'd really like to say my job is is "making OpenStack the best it possibly can be". If we keep focusing on the various services as entangled but separate and competing interests rather than on how to make OpenStack good, we're missing the point and the boat. Our job as developers is to make things easier (or at least possible) for the people who use the stuff we build. Naturally we want to make that as frictionless as possible, but not at the cost of "the people's" ease. There are many perverse incentives in OpenStack's culture which encourage people to hoard. For example it is useful to keep code in one's own team's repository because the BPs, reviews and bugs which reflect on that repository reflect on the value of the team. Who is that good for? So much of the talk is about trying to figure out how to make the gate more resilient. No! How about we listen to what the gate is telling us: Our code is full of race conditions, a memory pig, poorly defined contracts, and just downright tediously slow and heavy. And _fix that_. What I think we need to do to improve is enhance the granularity at which someone can participate. Smaller repos, smaller teams, cleaner boundaries between things. Disconnected (and rolling) release cycles. Achieve fail fast by testing in (much) smaller lumps before code ever reaches the global CI. You know: make better tests locally that confirm good boundaries. Don't run integration tests until the unit, pep8 and in-tree functional tests have passed. If there is a failure: exit! FAIL! Don't run all the rest of the tests uselessly. We need to not conflate the code and its structure with the structure of our governance. We need to put responsibility for the quality of the code on to the people who make it, not big infra. We need to make it easier for people to participate in that quality making. And most importantly we need to make sure the users are driving what we do and we need to make it far easier for them to do that driving. Obviously there are many more issues than these, but I think some of the above is being left out of the discussion, and this message needs to stop. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] PYTHONDONTWRITEBYTECODE=true in tox.ini
On Fri, 12 Sep 2014, Doug Hellmann wrote: We could use a git hook (see my earlier message in this thread) or we could add a command to tox to remove them before starting the tests. Neither of those solutions would affect the runtime behavior in a way that makes our dev environments fundamentally different from a devstack or production deployment. For reference, I've always been in the habit of managing automated tasks in code checkouts with a Makefile that has targets with depencies: e.g 'make test' will always do the 'clean' target and 'clean' does something like find . -name "*.pyc" ... My stumble into openstack to find tox being the one _visible_ source of automation was quite a shock. Things like git hooks do not count as visible, despite being useful. The idea being that the systems we work with as developers need to be both discoverable and introspectable. Or to be clear: transparent. tox, in general, is vaguely magical, and it would be a shame to add some more. Yes, the pyc thing screwing with your stuff is bad and bit me hard a few times but once I knew what it was what I really wanted was a quick (and centrally approved) way to do 'make clean'. I guess tox -eclean or something would be the same but it feels wrong somehow. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] PYTHONDONTWRITEBYTECODE=true in tox.ini
On Fri, 12 Sep 2014, Julien Danjou wrote: I guess the problem is more likely that testrepository load the tests From the source directory whereas maybe we could make it load them from what's installed into the venv? This rather ruins TDD doesn't it? -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract
On Wed, 10 Sep 2014, Jay Pipes wrote: There would be an Oslo library that would store the codification of the resource classes and actions, along with the mapping of (resource_class, action, version) to the JSONSchema document describing the payload field. This seems reasonable with two caveats: * What Sandy says about making sure the _actual_ codification was in a language neutral format of some kind. * That we don't limit my other concern (the macro one requoted below): Anybody should be able to rock up and dump _new_ notifications on the bus given suitable credentials and configuration with neither writing of local custom code nor centralized reviewing of global code. (I'd really like it to be possible to publish the notifications with _only_ credentials and no config and have a reasonable expectation of them being captured, but I understand that's not very robust and rather pipe dreamy.) I think it ought to be possible to accomplish both by: * Having the library you suggest but it reads schema from rather than embedding in code. * Same library can read additional files which contain 'local' customizations. Endpoints then know how to compose and decompose a vast swath of notifcations. A nice to have would be that the schema are inheritable. * At the macro level standardize a packaging or envelope of all notifications so that they can be consumed by very similar code. That is: constrain the notifications in some way so we can also constrain the consumer code. Or maybe we should just use RDF and produce a super upper ontology and consume all the world's knowledge as events? That's been super successful in other contexts... -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
On Tue, 9 Sep 2014, Samuel Merritt wrote: On 9/9/14, 4:47 PM, Devananda van der Veen wrote: The questions now before us are: - should OpenStack include, in the integrated release, a messaging-as-a-service component? I certainly think so. I've worked on a few reasonable-scale web applications, and they all followed the same pattern: HTTP app servers serving requests quickly, background workers for long-running tasks, and some sort of durable message-broker/queue-server thing for conveying work from the first to the second. A quick straw poll of my nearby coworkers shows that every non-trivial web application that they've worked on in the last decade follows the same pattern. While not *every* application needs such a thing, web apps are quite common these days, and Zaqar satisfies one of their big requirements. Not only that, it does so in a way that requires much less babysitting than run-your-own-broker does. I don't think there's any question about the value of a "durable message-broke/queue-server thing" in the general case. The question is whether OpenStack is in the business of satisfying that case. Which leads inevitably to the existential questions of What is OpenStack in the business of? and How do we stay sane while being in that business? Every long thread over the last couple of months has trended towards those questions. It's getting pretty tiresome. We'd all be a lot more focused if we knew the answer. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Kilo Cycle Goals Exercise
On Sun, 7 Sep 2014, Monty Taylor wrote: 1. Caring about end user experience at all 2. Less features, more win 3. Deleting things Yes. I'll give away all of my list for any one of these. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Kilo Cycle Goals Exercise
On Wed, 3 Sep 2014, Joe Gordon wrote: Have anyone interested (especially TC members) come up with a list of what they think the project wide Kilo cycle goals should be and post them on this thread by end of day Wednesday, September 10th. After which time we can begin discussing the results. I think this is a good idea, but the timing (right at the end of j-3) might be a problematic. I'll jump in, despite being a newb; perhaps that perspective is useful. I'm sure these represent the biases of my limited experience, so apply salt as required and please be aware that I'm not entirely ignorant of the fact that there are diverse forces of history that lead to the present. Things I'd like to help address in Kilo: * Notifications as a contract[1], better yet as events, with events taking primacy over projects. The main thrust of this topic has been the development of standards that allow endpoints to have some confidence that what is sent or received is the right thing. This is a good thing, but I think misses a larger issue with the notification environment. One of my first BPs was to make Ceilometer capable of hearing notifications from Ironic that contain metrics generated from IPMI readings. I was shocked to discover that _code_ was required to make this happen; my newbie naivety thought it ought to just be a configuration change: a dict on the wire transformed into a data store. I was further shocked to discover that the message bus was being modeled as RPC. I had assumed that at the scale OpenStack is expected to operate most activity on the bus would be modeled as events and swarms of semi-autonomous agents would process them. In both cases my surprise was driven by what I perceived to be a bad ordering of priority between project and events in the discussion of "making things happen". In this specific case the idea was presented as _Ironic_ needs to send some information to _Ceilometer_. Would it not be better to say: "there is hardware health information that happens and various things can process"? With that prioritization lots of different tools can produce and access the information. * Testing is slow and insufficiently reliable. Despite everyone's valiant effort this is true, we see evidence all over this list of trouble at the level of integration testing and testing during the development processing. My own experience has been that the tests (that is the way they are written and run) are relatively okay at preventing regression but not great at enabling TDD nor at being a pathway to understanding the code. This is probably because I think OO unittests are wack so just haven't developed the skill to read them well, but still: Tests are hard and that makes it harder to make good code. We can and should make it better. Facile testing makes it a lot easier to do tech debt cleanup that everyone(?) says we need. I reckon the efforts to library-ize tempest and things like Monty's dox will be useful tools. * Containers are a good idea, let's have more of them. There's a few different ways in which this matters: * "Skate to where the puck will be, not where it is" or "ZOMG VMs are like so last decade". * dox, as above * Containerization of OpenStack services for easy deployment and development. Perhaps `dock_it` instead of `screen_it` in devstack. * Focus on user experience. This one is the most important. The size and number of projects that assemble to become OpenStack inevitably leads to difficulty seeing the big picture when focusing on the individual features within each project. OpenStack is big, hard to deploy and manage, and challenging to understand and use effectively. I _really_ like Sean Dague's idea (sorry, I've lost the ref) that OpenStack needs to be usable and useful to small universities that want to run relatively small clouds. I think this needs to be true _without_ the value-adds that our corporate benefactors package around the core to ease deployment and management. Or to put all this another way: As we are evaluating what we want to do and how we want to do it we need to think less about the projects and technologies that are involved and more about the actions and results that our efforts hope to allow and enable. [1] http://lists.openstack.org/pipermail/openstack-dev/2014-September/044748.html -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] how to provide tests environments for python things that require C extensions
On Fri, 5 Sep 2014, Monty Taylor wrote: The tl;dr is "it's like tox, except it uses docker instead of virtualenv" - which means we can express all of our requirements, not just pip ones. Oh thank god[1]. Seriously. jogo started a thread about "what matters for kilo" and I was going to respond with (amongst other things) "get containers into the testing scene". Seems you're way ahead of me. Docker's caching could be a _huge_ win here. [1] https://www.youtube.com/watch?v=om5rbtudzrg across the project. Luckily, docker itself does an EXCELLENT job at handling caching and reuse - so I think we can have a set of containers that something in infra (waves hands) publishes to dockerhub, like: infra/py27 infra/py26 I'm assuming these would get rebuilt regularly (every time global requirements and friends are updated) on some sort of automated hook? Thoughts? Anybody wanna hack on it with me? I think it could wind up being a pretty useful tool for folks outside of OpenStack too if we get it right. Given availability (currently an unknown) I'd like to help with this. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, 5 Sep 2014, Daniel P. Berrange wrote: I venture to suggest that the reason we care so much about those kind of things is precisely because of our policy of pulling them in the tree. Having them in tree means their quality (or not) reflects directly on the project as a whole. Separate them from Nova as a whole and give them control of their own desinty and they can deal with the consequences of their actions and people can judge the results for themselves. Apart from any of the other issues present in this thread (and not commenting on them in this message), I think this paragraph (above) represents an unfortunately narrow view about how perceptions of the quality of OpenStack work. People who are invested in using OpenStack in some fashion and are not in the development priesthood see OpenStack. They don't see individual teams making virt drivers. It may be (I don't know) that having more granularity in projects will allow different teams to engage at different rates and thus get stuff done, but I do not think it will do much with regard to external perceptions of quality. That's going to take a much different kind of work and attention. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting
On Thu, 4 Sep 2014, Flavio Percoco wrote: Thanks for writing this up, interesting read. 5. Ceilometer's recommended storage driver is still MongoDB, although Ceilometer has now support for sqlalchemy. (Please correct me if I'm wrong). For sake of reference: Yes, MongoDB is currently the recommended store and yes, sqlalchemy support is present. Until recently only sqlalchemy support was tested in the gate. Two big changes being developed in Juno related to storage: * Improved read and write performance in the sqlalchemy setup. * time series storage and Gnocchi: https://julien.danjou.info/blog/2014/openstack-ceilometer-the-gnocchi-experiment I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't keep avoiding these technologies. NoSQL technologies have been around for years and we should be prepared - including OpenStack operators - to support these technologies. Not every tool is good for all tasks - one of the reasons we removed the sqlalchemy driver in the first place - therefore it's impossible to keep an homogeneous environment for all services. +1. Ain't that the truth. As mentioned in the meeting on Tuesday, Zaqar is not reinventing message brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack flavor on top. [0] In my efforts to track this stuff I remain confused on the points in these two questions: https://wiki.openstack.org/wiki/Zaqar/Frequently_asked_questions#How_does_Zaqar_compare_to_oslo.messaging.3F https://wiki.openstack.org/wiki/Zaqar/Frequently_asked_questions#Is_Zaqar_an_under-cloud_or_an_over-cloud_service.3F What or where is the boundary between Zaqar and existing messaging infrastructure? Not just in terms of technology but also use cases? The answers above suggest its not super solid on the use case side, notably: "In addition, several projects have expressed interest in integrating with Zaqar in order to surface events..." Instead of Zaqar doing what it does and instead of oslo.messaging abstracting RPC, why isn't the end goal a multi-tenant, multi-protocol event pool? Wouldn't that have the most flexibility in terms of ecosystem and scalability? In addition to the aforementioned concerns and comments, I also would like to share an etherpad that contains some use cases that other integrated projects have for Zaqar[0]. The list is not exhaustive and it'll contain more information before the next meeting. [0] https://etherpad.openstack.org/p/zaqar-integrated-projects-use-cases For these, what is Zaqar providing that oslo.messaging (and its still extant antecedents) does not? I'm not asking to naysay Zaqar, but to understand more clearly what's going on. My interest here comes from a general interest in now events and notifications are handled throughout OpenStack. Thanks. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Treating notifications as a contract
On Wed, 3 Sep 2014, Sandy Walsh wrote: Is there anything slated for the Paris summit around this? There are plans to make plans, but that's about all I know. I just spent nearly a week parsing Nova notifications and the pain of no schema has overtaken me. /me passes the ibuprofen We're chatting with IBM about CADF and getting down to specifics on their applicability to notifications. Once I get StackTach.v3 into production I'm keen to get started on revisiting the notification format and olso.messaging support for notifications. Perhaps a hangout for those keenly interested in doing something about this? That seems like a good idea. I'd like to be a part of that. Unfortunately I won't be at summit but would like to contribute what I can before and after. I took some notes on this a few weeks ago and extracted what seemed to be the two main threads or ideas the were revealed by the conversation that happened in this thread: * At the micro level have versioned schema for notifications such that one end can declare "I am sending version X of notification foo.bar.Y" and the other end can effectively deal. * At the macro level standardize a packaging or envelope of all notifications so that they can be consumed by very similar code. That is: constrain the notifications in some way so we can also constrain the consumer code. These ideas serve two different purposes: One is to ensure that existing notification use cases are satisfied with robustness and provide a contract between two endpoints. The other is to allow a fecund notification environment that allows and enables many participants. Is that a good summary? What did I leave out or get wrong? -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [qa] [ceilometer] [swift] tempests tests, grenade, old branches
I've got a review in progress for adding a telemetry scenario test: https://review.openstack.org/#/c/115971/ It can't pass the *-icehouse tests because ceilometer-api is not present on the icehouse side of a havana->icehouse upgrade. In the process of trying to figure out what's going on I discovered so many confusing things that I'm no longer clear on: * Whether this is a fixable problem? * Whether it is worth fixing? * How (or if) it is possible to disable the test in question for older branches? * Maybe I should scrap the whole thing?[1] The core problem is that older branches of grenade do not have an upgrade-ceilometer, so though some ceilometer services do run in Havana they are not restarted over the upgrade gap. Presumably that could be fixed by backporting some stuff to the relevant branch. I admit, though, that at times it can be rather hard to tell which branch during a grenade run is providing the configuration and environment variables. In part this is due to an apparent difference in default local behavior and gate behavior. Suppose I wanted to exactly what replicate on a local setup what happens on a gate run, where do I go to figure that out? That seems a bit fragile, though. Wouldn't it be better to upgrade services based on what services are actually running, rather than some lines in a shell script? I looked into how this might be done and the mapping from ENABLED_SERVICES to actually-running-processes to some-generic-name-to-identify-an-upgrade is not at all straightforward. I suspect this is a known problem that people would like to fix, but I don't know where to look for more discussion on the matter. Please help? [1] And finally, the other grenade runs, those that are currently passing are only passing because a very long loop is waiting up to two minutes for notification messages (from the middleware) to show up at the ceilometer collector. Is this because the instance is just that overloaded and process contention is so high and it is just going to take that long? Is so, is there much point having a test which introduces this kind of potential legacy. A scenario test appears to be exactly what's needed here, but at what cost? What I'm after here is basically threefold: * Pointers to written info on how I can resolve these issues, if it exists. * If it doesn't, some discussion here on options to reach some resolution. * A cup of tea or other beverage of our choice and some sympathy and commiseration. A bit of "I too have suffered at the hands of grenade". Then we can all be friends. From my side I can provide a promise to follow through on improvements we discover. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Wed, 27 Aug 2014, Doug Hellmann wrote: I have found it immensely helpful, for example, to have a written set of the steps involved in creating a new library, from importing the git repo all the way through to making it available to other projects. Without those instructions, it would have been much harder to split up the work. The team would have had to train each other by word of mouth, and we would have had constant issues with inconsistent approaches triggering different failures. The time we spent building and verifying the instructions has paid off to the extent that we even had one developer not on the core team handle a graduation for us. +many more for the relatively simple act of just writing stuff down -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Wed, 27 Aug 2014, Doug Hellmann wrote: For example, Matt helped me with an issue yesterday, and afterwards I asked him to write up a few details about how he reached his conclusion because he was moving fast enough that I wasn’t actually learning anything from what he was saying to me on IRC. Having an example with some logs and then even stream of consciousness notes like “I noticed the out of memory error, and then I found the first instance of that and looked at the oom-killer report in syslog to see which process was killed and it was X which might mean Y” would help. +many I'd _love_ to be more capable at gate debugging. That said, it does get easier just by doing it. The first many times is like beating my head against the wall, especially the constant sense of where am I and where do I need to go. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] [glance] python namespaces considered harmful to development, lets not introduce more of them
On Wed, 27 Aug 2014, Sean Dague wrote: Here is the problem when it comes to working with code from git, in python, that uses namespaces, it's kind of a hack that violates the principle of least surprise. It's true this problem does happen... So I'd like us to revisit using a namespace for glance, and honestly, for other places in OpenStack, because these kinds of violations of the principle of least surprise is something that I'd like us to be actively minimizing. ...but on the otherhand using namespaces can be really handy. For sake of example, the thing I used to work on (TiddlyWeb) using namespace packages for plugins. The namespace is (surprise) 'tiddlywebplugins'[1]. To get around the issues you're describing, especially during development, we cooked up an ugly hack called the "mangler" that establishes the namespace properly. You can see a sample in Tank, which is a plugin for TiddlyWeb: https://github.com/cdent/tank/blob/master/mangler.py Yes, it's ooogly, but if you want namespaces (and under the right circumstances they can be "one honking great idea"), it helps. May be something to steal there. [1] I do think, when using namespaces, that you must have a namespace for the extensions different from whatever the thing being extended is using. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Wed, 27 Aug 2014, Angus Salkeld wrote: I believe developers working on OpenStack work for companies that really want this to happen. The developers also want their projects to be well regarded. Just the way the problem is using framed is a bit like you did above and this is very daunting for any one person to solve. If we can we quantify the problem, break the work into doable items of work (bugs) and prioritized it will be solved a lot faster. Yes. It's very easy when encountering organizational scaling issues to start catastrophizing and then throwing all the extant problems under the same umbrella. This thread (and the czar one) has grown to include a huge number of problems. We could easily change the subject to just "The Future". I think two things need to happen: * Be rational about the fact that at least in some areas we are trying to do too much with too little. Strategically that means we need: * to prioritize and decompose issues (of all sorts) better * get more resources (human and otherwise) That first is on us. The second I guess gets bumped up to the people with the money; one aspect of being rational is utilizing the fact that though OpenStack is open source, it is to a very large extent corporate open source. If the corps need to step up, we need to tell them. * Do pretty much exactly what Angus says: 10 identify bugs (not just in code) 20 find groups who care about those bugs 30 fix em 40 GOTO 10 # FOR THE REST OF TIME We all know this, but I get the impression it can be hard to get traction. I think a lot of the slipping comes from too much emphasis on the different projects. It would be better to think "I work on OpenStack" rather than "I work on Ceilometer" (or whatever). I'm not opposed to process and bureaucracy, it can be very important part of the puzzle of getting lots of different groups to work together. However an increase in both can be a bad smell indicating an effort to hack around things that are perceived to be insurmountable problems (e.g. getting more nodes for CI, having more documentors, etc.). -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [qa] In-tree functional test vision
On Mon, 25 Aug 2014, Joe Gordon wrote: [Other stuff snipped, thanks for that, good to have some pointers.] Why can't you run devstack locally? Maybe there are some changes we can make so its easier to run devstack locally first. I do run a local devstack, and throw in some tempest and grenade every now and again too. But in terms of automated local testing in the project tree there are places it is difficult for clean unit tests to reach. Sure we can make really hairy mocks, but that results in tests which a) make no sense b) it is hard to have any confidence in. Thus "in tree functional tests": * to reach places unit tests won't go * to not have the noise of all that mock and OO mess * to have some faith in the end to end The sorts of things that require provisioning of temporary datastores, interception of wsgi apps, in process message queues... -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Thu, 21 Aug 2014, Sean Dague wrote: By blessing one team what we're saying is all the good ideas pool for tackling this hard problem can only come from that one team. This is a big part of this conversation that really confuses me. Who is that "one team"? I don't think it is that team that is being blessed, it is that project space. That project space ought, if possible, have a team made up of anyone who is interested. Within that umbrella both the competition and cooperation that everyone wants can happen. You're quite right Sean, there is a lot of gravity that comes from needing to support and slowly migrate the existing APIs. That takes up quite a lot of resources. It doesn't mean, however, that other resources can't work on substantial improvements in cooperation with the rest of the project. Gnocchi and the entire "V3" concept in ceilometer are a good example of this. Some folk are working on that and some folk are working on maintaining and improving the old stuff. Some participants in this thread seem to be saying "give some else a chance". Surely nobody needs to be given the chance, they just need to join the project and make some contributions? That is how this is supposed to work isn't it? -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ceilometer] indicating sample provenance
On Thu, 21 Aug 2014, Nejc Saje wrote: More riffing: we are moving away from per-sample specific data with Gnocchi. I don't think we should store this per-sample, since the user doesn't actually care about which agent the sample came from. The user cares about which *resource* it came from. I'm thinking from a debugging and auditing standpoint it is useful to know the hops an atom of data has taken on its way to its final destination. Under normal circumstances that info isn't needed, but under extraordinary circumstances it could be useful. I could see this going into an agent's log. On each polling cycle, we could log which *resources* we are responsible (not samples). If it goes in the agent's log how do you associate a particular sample with that log? From the sample (or resource metadata or what have you) you can know the time window of the resource. Now you need to go looking around all the agents to find out which one was satisfying that resource within that time window. If there are two agents, no big deal, if there are 2000, problem. And besides: Consider integration testing scenarios, making the data a bit more meaningful will make it possible to do more flexible testing. I appreciate that searching through endless log files is a common task in OpenStack but that doesn't make it the best way. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ceilometer] indicating sample provenance
On Wed, 20 Aug 2014, gordon chung wrote: disclaimer: i'm just riffing and the following might be nonsense. /me is a huge fan of riffing i guess also to extend your question about agents leaving/joining. i'd expect there is some volatility to the agents where an agent may or may not exist at the point of debugging... just curious what the benefit is of knowing who sent it if all the agents are just clones of each other. What I'm thinking of is situation where some chunk of samples is arriving at the data store and is in some fashion outside the expected norms when compared to others. If, from looking at the samples, you can tell that they were all published from the (used-to-be-)central-agent on host X then you can go to host X and have a browse around there to see what might be up. It's unlikely that the agent is going to be the cause of any weirdness but if it _is_ then we'd like to be able to locate it. As things currently stand there's no way, from the sample itself, to do so. Thus, the "benefit of knowing who sent it" is that though the agents themselves are clones, they are in regions and on hosts that are not. Beyond all those potentially good reasons there's also just the simple matter that it is good data hygiene to know where stuff came from? -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [ceilometer] indicating sample provenance
One of the outcomes from Juno will be horizontal scalability in the central agent and alarm evaluator via partitioning[1]. The compute agent will get the same capability if you choose to use it, but it doesn't make quite as much sense. I haven't investigated the alarm evaluator side closely yet, but one concern I have with the central agent partitioning is that, as far as I can tell, it will result in stored samples that give no indication of which (of potentially very many) central-agent it came from. This strikes me as a debugging nightmare when something goes wrong with the content of a sample that makes it all the way to storage. We need some way, via the artifact itself, to narrow the scope of our investigation. a) Am I right that no indicator is there? b) Assuming there should be one: * Where should it go? Presumably it needs to be an attribute of each sample because as agents leave and join the group, where samples are published from can change. * How should it be named? The never-ending problem. Thoughts? [1] https://review.openstack.org/#/c/113549/ [2] https://review.openstack.org/#/c/115237/ -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [qa] Accessing environment information in javelin2
On Mon, 18 Aug 2014, Chris Dent wrote: The reason for doing this? I want to be able to confirm that some sample data retrieved in a query against the ceilometer API has samples that span the upgrade. The associated change is here: https://review.openstack.org/#/c/102354 -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [qa] Accessing environment information in javelin2
To make some time oriented comparisons in javelin2 I'd like to be able to access the timestamps on the data dumps in the $SAVE_DIR. In my experiments I've done this by pushing SAVE_DIR and BASE_RELEASE into the subshell that calls javelin2 -m create in grenade.sh. Is there: * A better way to get those two chunks of info (that is, without changing grenade.sh)? * Some other way to get a timestamp that is a time shortly before the services in the TARGET_RELEASE have started? The reason for doing this? I want to be able to confirm that some sample data retrieved in a query against the ceilometer API has samples that span the upgrade. Thoughts? Thanks. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Fri, 15 Aug 2014, Sandy Walsh wrote: I recently suggested that the Ceilometer API (and integration tests) be separated from the implementation (two repos) so others might plug in a different implementation while maintaining compatibility, but that wasn't well received. Personally, I'd like to see that model extended for all OpenStack projects. Keep compatible at the API level and welcome competing implementations. I think this is a _very_ interesting idea, especially the way it fits in with multiple themes that have bounced around the list lately, not just this thread: * Improving project-side testing; that is, pre-gate integration testing. * Providing a framework (at least conceptual) on which to inform the tempest-libification. * Solidifying both intra- and inter-project API contracts (both HTTP and notifications). * Providing a solid basis on which to enable healthy competition between implementations. * Helping to ensure that the various projects work to the goals of their public facing name rather than their internal name (e.g. Telemetry vs ceilometer). Given the usual trouble with resource availability it seems best to find tactics that can be applied to multiple strategic goals. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [QA] Picking a Name for the Tempest Library
On Fri, 15 Aug 2014, Jay Pipes wrote: I suggest that "tempest" should be the name of the import'able library, and that the integration tests themselves should be what is pulled out of the current Tempest repository, into their own repo called "openstack-integration-tests" or "os-integration-tests". This idea is best, but if a new name is required, tempit is good because it is a) short b) might subconsciously remind people that testing ought to be fast(-ish). -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
On Fri, 8 Aug 2014, Nikola Đipanov wrote: To me the runway approach seems like yet another set of arbitrary hoops that we will put in place so that we don't have to tell people that we don't have bandwidth/willingness to review and help their contribution in. I pretty much agree with this. As things stand there are a lot of hoops for casual contributors. For the people who make more regular contributions these hoops are either taken as the norm and good safety precautions or are an annoying tax that you just kind of have to deal with. Few of those hoops say clearly and explicitly that the project is resource constrained. There are certainly lots of clues and cues that is going on. It would be best to be as open and upfront as possible. Meanwhile, there are fairly perverse incentives in place that work against strategic contribution, despite many people acknowledging the need to be more strategic. It's a tricky problem. If there really is a resource starvation problem, it is best to be honest that this is a project that is primarily funded by and staffed from organizational members. From there is where strategic resources will have to come, in part because of the incentives, in part because those organizational members want a healthy framework on which to lay their tactical changes and a context in which to say "lookee, we're a part of this big deal thing". But after all that it's important to keep in mind that shit's not broken: Every few days I'll update all my various repos and think "wow that's an awful lot of changed code". -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] ceilometer] [ft] Improving ceil.objectstore.swift_middleware
On Fri, 8 Aug 2014, Osanai, Hisashi wrote: Is there any way to proceed ahead the following topic? There are three active reviews that are somewhat related to this topic: Use a FakeRequest object to test middleware: https://review.openstack.org/#/c/110302/ Publish samples on other threads: https://review.openstack.org/#/c/110257/ Permit usage of notifications for metering https://review.openstack.org/#/c/80225/ The third one provides a way to potentially overcome the existing performance problems that the second one is trying to fix. These may not be directly what you want, but are something worth tracking as you explore and think. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [qa] In-tree functional test vision
In the "Thoughts on the patch test failure rate and moving forward" thread[1] there's discussion of moving some of the burden for functional testing to the individual projects. This seems like a good idea to me, but also seems like it could be a source of confusion so I thought I'd start another thread to focus on the details of just this topic, separate from the gate-oriented discussion in the other. In a couple of messages[2] Sean mentions "the vision". Is there a wiki page or spec or other kind of document where this nascent vision is starting to form? Even if we can't quite get started just yet, it would be good to have an opporunity to think about the constraints and goals that we'll be working with. Not just the goal of moving tests around, but what, for example, makes a good functional test? For constraints: Will tempest be available as a stable library? Is using tempest (or other same library across all projects) a good or bad thing? Seems there's some disagreement on both of these. Personally I'm quite eager to to vastly increase the amount of testing I can do on my own machine(s) before letting the gate touch my code. [1] http://lists.openstack.org/pipermail/openstack-dev/2014-July/thread.html#41057 [2] http://lists.openstack.org/pipermail/openstack-dev/2014-July/041188.html http://lists.openstack.org/pipermail/openstack-dev/2014-July/041252.html -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] How to get testr to failfast
On Thu, 31 Jul 2014, Dmitry Tantsur wrote: It would be my 2nd wanted feature in our test system (after getting reasonable error message (at least not binary) in case of import errors :) I managed to figure out a way to exit on first failure so added a FAQ section to the Testr page on the wiki: https://wiki.openstack.org/wiki/Testr#FAQ and included it in there. With luck other people will add stuff. Now to see about speed. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ceilometer] [swift] Improving ceilometer.objectstore.swift_middleware
On Thu, 31 Jul 2014, Julien Danjou wrote: I'm just thinking out loud and did not push that through, but I wonder if we should not try to use the oslo.messaging notifier middleware for that. It would be more standard (as it's the one usable on all HTTP pipelines) and rely on notification and generates events, as anyway, HTTP requests are events. Then it'd be up to Ceilometer to handle those notifications like it does for the rest of OpenStack. I assume you mean this stuff: https://github.com/openstack/oslo-incubator/tree/master/openstack/common/middleware If this is an option, and having swift "own this" is not, that's certainly more in line with what I would prefer: a common-code based solution with no swift dependencies. That would get both of the issues from the aforementioned bugs: * the presumed more performant use of notifications * get rid of the swift dependencies Will link to this thread from the bugs for visibility. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [all] How to get testr to failfast
One of the things I like to be able to do when in the middle of making changes is sometimes run all the tests to make sure I haven't accidentally caused some unexpected damage in the neighborhood. If I have I don't want the tests to all run, I'd like to exit on first failure. This is a common feature in lots of testrunners but I can't seem to find a way to make it happen when testr is integrated with setuptools. Any one know a way? There's this: https://bugs.launchpad.net/testrepository/+bug/1211926 But it is not clear how or where to effectively pass the right argument, either from the command line or in tox.ini. Even if you don't know a way, I'd like to hear from other people who would like it to be possible. It's one of several testing habits I have from previous worlds that I'm missing and doing a bit of commiseration would be a nice load off. Thanks. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [ceilometer] [swift] Improving ceilometer.objectstore.swift_middleware
ceilometer/objectstore/swift_middleware.py[1] counts the size of web request and reponse bodies through the swift proxy server and publishes metrics of the size of the request and response and that a request happened at all. There are (at least) two bug reports associated with this bit of code: * avoid requirement on tarball for unit tests https://bugs.launchpad.net/ceilometer/+bug/1285388 * significant performance degradation when ceilometer middleware for swift proxy uses https://bugs.launchpad.net/ceilometer/+bug/1337761 On the first bug the goal is to remove the dependency on swift from ceilometer. This is halfway done but there are barriers[2] with regard to the apparently unique way that swift does logging and the fact that InputProxy and split_path live in swift rather than some communal location. The barriers may be surmountable but if other things in the same context are changing, it might not be necessary. On the second bug, while the majority of the performance cost is in the call to rpc_server.cast(), achieving maximum performance would probably come from doing the counts and notifications _not_ in middlewhere. The final application in the WSGI stack will know the size of requests and responses without needing to sometime recalculate. May as well use that. These two situations overlap in a few ways that suggest we could make some improvements. I'm after input from both the swift crew and the ceilometer crew to see if we can reach something that is good for the long term rather than short term fixes to these bugs. Some options appear to be: * Move the middleware to swift or move the functionality to swift. In the process make the functionality drop generic notifications for storage.objects.incoming.bytes and storage.objects.outgoing.bytes that anyone can consume, including ceilometer. This could potentially address both bugs. * Move or copy swift.common.utils.{InputProxy,split_path} to somewhere in oslo, but keep the middleware in ceilometer. This would require somebody sharing the info on how to properly participate in swift's logging setup without incorporating swift. This would fix the first bug without saying anything about the second. * Carry on importing the swift tarball or otherwise depending on swift. Fixes neither bug, maintains status quo. What are other options? Of those above which are best or most realistic? Personally I'm a fan of the first option: move the functionality into swift and take it out of middleware. This gets the maximum win for performance and future flexibility (of consumers). [1] https://github.com/openstack/ceilometer/blob/master/ceilometer/objectstore/swift_middleware.py [2] https://review.openstack.org/#/c/110302/ -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [PKG-Openstack-devel] Bug#755315: [Trove] Should we stop using wsgi-intercept, now that it imports from mechanize? this is really bad!
On Tue, 29 Jul 2014, Chris Dent wrote: Let me know whenever you have a new release, without mechanize as new dependency, or with it being optional. It will be soon (a day or so). https://pypi.python.org/pypi/wsgi_intercept is now at 0.8.0 All traces of mechanize removed. Have at. Enjoy. If there are issues please post them in the github issues https://github.com/cdent/python3-wsgi-intercept/issues first before the openstack-dev list... Please note that the long term plan is likely to be that _all_ the interceptors will be removed and will be packaged as their own packages with the core package only providing the faked socket and environ infrastructure for the interceptors to use. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [PKG-Openstack-devel] Bug#755315: [Trove] Should we stop using wsgi-intercept, now that it imports from mechanize? this is really bad!
On Tue, 29 Jul 2014, Thomas Goirand wrote: Sorry, I couldn't reply earlier. No problem. However, from *your* perspective, I wouldn't advise that you keep using such a dangerous, badly maintained Python module. Saying that it's optional may look like you think mechanize is ok and you are vouching for it, when it really shouldn't be the case. Having clean, well maintained dependencies, is IMO very important for a given python module. It shows that you care no bad module gets in. I've pointed a couple of the other wsgi-intercept contributors to this thread to get their opinions on which way is the best way forward, I'd prefer not to make the decision solo. Let me know whenever you have a new release, without mechanize as new dependency, or with it being optional. It will be soon (a day or so). -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Trove] Should we stop using wsgi-intercept, now that it imports from mechanize? this is really bad!
On Mon, 28 Jul 2014, Thomas Goirand wrote: That's exactly the version which I've been looking at. The thing is, when I run the unit test with that version, it just bombs on me because mechanize isn't there. How would you feel about it being optionally available and for the tests for mechanize to only run for it if someone has aleady preinstalled mechanize? That is the tests will skip if import mechanize is an ImportError? While I'm not in love with mechanize, if it is a tool that _some_ people use, then I don't want wsgi-intercept to not be useful to them. Please let me know if you can release a new version of wsgi-intercept cleaned from any trace of mechanize, or if you think this can't be done. Let me know if the above idea can't work. Depending on your answer I'll either release a version as described, or go ahead and flush it. If you get back to me by tomorrow morning (UTC) I can probably get the new version out tomorrow too. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Trove] Should we stop using wsgi-intercept, now that it imports from mechanize? this is really bad!
On Sun, 27 Jul 2014, Thomas Goirand wrote: I don't think you get it. The question isn't to "fix Trove to be ready for Py3.4", we're very far from that. The question is: how can I maintain the python-wsgi-intercept package in Debian, when it now depends on a very bad package in the newer version that I need to upgrade to. And how can I continue to have it work in Debian Jessie for the soon-to-come freeze deadline of the 5th of November. So I'm more concerned by Icehouse right now, and not even remotely thinking about the K release. I maintain wsgi-intercept, and I'm happy to remove mechanize if that's really necessary. I didn't want it in there but when someone asked for it to be back in there was insufficient objection so back in it went. https://github.com/cdent/python3-wsgi-intercept/pull/16 If it is causing problem, then by all means say so. In any case, mechanize shouldn't be _required_, it should just be available if you ask for it. If you don't import wsgi_intercept.mechanize_intercept mechanize will not be loaded, and if you try to do so with Python 3 you'll get an AssertionError and the world will crash around you. Maybe you aren't looking at the most recent version (0.7.0)? If there are issues please report them as bugs on github, they'll get fixed: https://github.com/cdent/python3-wsgi-intercept/issues If there's a better way to do the optional-ness of mechanize (without changing everything else), then please suggest something. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
There's a review in progress for a generic event format for PaaS-services which is a move with the right spirit: allow various services to join the the notification party without needing special handlers. See: https://review.openstack.org/#/c/101967/ -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ceilometer] overuse of 'except Exception'
On Thu, 24 Jul 2014, Doug Hellmann wrote: I don’t claim any special status except that I was there and am trying to provide background on why things are as they are. :-) I think that counts and I very much appreciate the responses. Having a hard-fail error handler is useful in situations where continuing operation would make the problem worse *and* the deployer can fix the problem. Being unable to connect to a database might be an example of this. However, we don’t want the ceilometer collector to shutdown if it receives a notification it doesn’t understand because discarding a malformed message isn’t going to make other ceilometer operations any worse, and seeing such a message isn’t a situation the deployer can fix. Catching KeyError, IndexError, AttributeError, etc. for those cases would be useful if we were going to treat those exception types differently somehow, but we don’t. I guess what I'm asking is "shouldn't we treat them differently?" If I've got a dict coming in over the bus and it is missing a key, big deal, the bus is still working. I _do_ want to know about it but it isn't a disaster so I can (and should) catch KeyError and log a short message (without traceback) that is specially encoded to say "how about that the notification payload was messed up". Maybe such a thing is elsewhere in the stack, if so, great. In that case the thing code I pointed out as a bit of a compromise is in place, just not in the same place. What I don't want is a thing logged as _exceptional_ when it isn't. That said, it’s possible we could tighten up some of the error handling outside of event processing loops, so as I said, if you have a more specific proposal for places you think we can predict the exception type reliably, we should talk about those directly instead of the general case. You mention distinguishing between “the noise and the nasty” — do you have some logs you can share? Only vaguely at this point, based on my experiences in the past few days trying to chase down failures in the gate. There's just so much logged, a lot of which doesn't help, but at the same time a fair bit which looks like it ought to be a traceback and handled more aggressively. That experience drove me into the Ceilometer code in an effort to check the hygiene there and see if there was something I could do in that small environment (rather than the overwhelming context of the The Entire Project™). I'll pay a bit closer attention to the specific relationship between the ceilometer exceptions (on the loops) and the logs and when I find something that particularly annoys me, I'll submit a patch for review and we'll see how it goes. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ceilometer] overuse of 'except Exception'
On Wed, 23 Jul 2014, Doug Hellmann wrote: That's bad enough, but much worse, this will catch all sorts of exceptions, even ones that are completely unexpected and ought to cause a more drastic (and thus immediately informative) failure than 'something failed’. In most cases, we chose to handle errors this way to keep the service running even in the face of “bad” data, since we are trying to collect an audit stream and we don’t want to miss good data if we encounter bad data. a) I acknowledge that you're actually one of the "elders" to whom I referred earlier so I hesitate to disagree with you here, so feel free to shoot me down, but... b) "keep the service running" in the face of "bad" is exactly the sort or reason why I don't like this idiom. I think those exceptions which we can enumerate as causes of "bad" should be explicitly caught and explicitly logged and the rest of them should explicitly cause death exactly because we don't know what happened and the situation is _actually_ exceptional and we ought to know now, not later, that it happened, and not some number of minutes or hours or even days later when we notice that some process, though still running, hasn't done any real work. That kind of "keep it alive" rationale often leads to far more complex debugging situations than otherwise. In other words there are two kinds of "bad": The bad that we know and can expect (even though we don't want it) and the bad that we don't know and shouldn't expect. These should be handled differently. A compromise position (if one is needed) would be something akin to, but not exactly like: except (TheVarious, ExceptionsIKnow) as exc: LOG.warning('shame, no workie, but you know, it happens: %s', exc) except Exception: LOG.exception('crisis!') This makes it easier to distinguish between the noise and the nasty, which I've found to be quite challenging thus far. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [ceilometer] overuse of 'except Exception'
I was having a bit of a browse through the ceilometer code and noticed there are a fair few instances (sixty-some) of `except Exception` scattered about. While not as evil as a bare except, my Python elders always pointed out that doing `except Exception` is a bit like using a sledgehammer where something more akin to a gavel is what's wanted. The error condition is obliterated but there's no judgement on what happened and no apparent effort by the developer to effectively handle discrete cases. A common idiom appears as: except Exception: LOG.exception(_('something failed')) return # or continue There's no information here about what failed or why. That's bad enough, but much worse, this will catch all sorts of exceptions, even ones that are completely unexpected and ought to cause a more drastic (and thus immediately informative) failure than 'something failed'. So, my question: Is this something we who dig around in the ceilometer code ought to care about and make an effort to clean up? If so, I'm happy to get started. Thanks. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On Tue, 15 Jul 2014, Sandy Walsh wrote: This looks like a particular schema for one event-type (let's say "foo.sample"). It's hard to extrapolate this one schema to a generic set of common metadata applicable to all events. Really the only common stuff we can agree on is the stuff already there: tenant, user, server, message_id, request_id, timestamp, event_type, etc. This is pretty much what I'm trying to figure out. We can, relatively agree on a small set of stuff (like what you mention). Presumably there are three more sets: * special keys that could be changed to something more generally meaningful if we tried hard enough * special keys that really are special and must be saved as such * special keys that nobody cares about and can be tossed Everybody thinks their own stuff is special[1] but it is often the case that it's not. In your other message you linked to http://paste.openstack.org/show/54140/ which shows some very complicated payloads (but only gets through the first six events). Is there related data (even speculative) for how many of those keys are actually used? And just looking at the paste (and the problem) generally, does it make sense for the accessors in the dictionaries (the keys) to be terms which are specific to the producer? Obviously that will increase the appearance of disjunction between different events. A different representation might not be as problematic. Or maybe I'm completely wrong, just thinking out loud. This way, we can keep important notifications in a priority queue and handle them accordingly (since they hold important data), but let the samples get routed across less-reliable transports (like UDP) via the RoutingNotifier. Presumably a more robust, uh, contract, for notifications, will allow them to be dispatched (and re-dispatched) more effectively. Also, send the samples one-at-a-time and let them either a) drop on the floor (udp) or b) let the aggregator roll them up into something smaller (sliding window, etc). Making these large notifications contain a list of samples means we had to store state somewhere on the server until transmission time. Ideally something we wouldn't want to rely on. I've wondered about this too. Is there history for why some of the notifications which include samples are rolled up lists instead of fired off one at a time. Seems like that will hurt parallelism opportunities? [1] There's vernacular here that I'd prefer to use but this is a family mailing list. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On Tue, 15 Jul 2014, Mark McLoughlin wrote: So you're proposing that all payloads should contain something like: [...] a class, type, id, value, unit and a space to put additional metadata. That's the gist of it, but I'm only presenting that as a way to get somebody else to point out what's wrong with it so we can get closer to what's actually needed... On the subject of "notifications as a contract", calling the additional metadata field 'extra' suggests to me that there are no stability promises being made about those fields. Was that intentional? ...and as you point out, if everything that doesn't fit in the "known" fields goes in 'extra' then the goal of contractual stability may be lost. What I think that shows us is that what we probably want is three levels of contract. Currently we have one: * There is a thing called a notification and it has a small number of top-level fields include 'payload' At the second level would be: * There is a list of things _in_ the payload which are events and they have some known general struture that allows injestion (as data) by lots of consumers. And the third level would be: * Each event has an internal structure (below the general structure) which is based on its type. In the simplest cases (some meters for example) a third level could either not be necessary or at least very small[1]. This is the badly named "extras" above. Basically: If people are willing to pay the price (in terms of changes) for contractual stability may as well get some miles out of it. Three layers of abstraction means there can be three distinct levels in applications or tools, each of which are optimized to a different level of the topology: transporting messages, persisting/publishing messages, extracting meaning from messages. That's kind of present already, but it is done in a way that requires a lot of duplication of knowledge between producer and consumer and within different parts of the consumer. Which makes effective testing and scaling more complex that it ought to be. I know from various IRC chatter that this is a perennial topic round these parts, frequently foundering, but perhaps each time we get a bit further, learn a bit more? In any case, as Eoghan and I stated elsewhere in the thread I'm going to try to drive this forward, but I'm not going to rush it as there's no looming deadline. [1] Part of the reason I drove us off into the so-called weeds earlier in the thread is a sense that a large number of events/samples/notification-payloads are capable of being classed as nearly the same thing (except for their name and units) and thus would not warrant their own specific schema. These are "just" events that should have a very similar structure. If that structure is well known and well accepted we will likely find that many existing events can be bent to fit that structure for the sake of less code and more reuse. As part of this process I'll try to figure out if this is true . -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [qa] Getting rolling with javelin2
On Mon, 14 Jul 2014, Sean Dague wrote: Javelin2 lives in tempest, currently the following additional fixes are needed for it to pass the server & image creation in grenade - https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:javelin_img_fix,n,z Thanks for the pointers. That stuff looks good (and is now merged) and I'm testing my changes against the new shiny. Those were posted for review last Friday, need eyes on them. This is still basically the minimum viable code there, and additional unit tests should be added. Assistance there appreciated. I have to admit I'm struggling to get my head around _how_ to unit something that is itself a test. Is the idea to mock the clients? I'm not sure how much value that will have (compared to just running the thing). There is a grenade patch that will consume that once landed - https://review.openstack.org/#/c/97317/ - local testing gets us to an unrelated ceilometer bug. However landing the 2 tempest patches first should be done. If you'd like me to look into that ceilometer bug, please let me know what it is. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On Sat, 12 Jul 2014, Eoghan Glynn wrote: So what we need to figure out is how exactly this common structure can be accommodated without reverting back to what Sandy called the "wild west" in another post. I got the impression that "wild west" is what we've already got (within the payload)? For example you could write up a brief wiki walking through how an existing widely-consumed notification might look under your vision, say compute.instance.start.end. Then post a link back here as an RFC. Or, possibly better, maybe submit up a strawman spec proposal to one of the relevant *-specs repos and invite folks to review in the usual way? Would oslo-specs (as in messaging) be the right place for that? My thinking is the right thing to do is bounce around some questions here (or perhaps in a new thread if this one has gone far enough off track to have dropped people) and catch up on some loose ends. For example: It appears that CADF was designed for this sort of thing and was considered at some point in the past. It would be useful to know more of that story if there are any pointers. My initial reaction is that CADF has the stank of enterprisey all over it rather than "less is more" and "worse is better" but that's a completely uninformed and thus unfair opinion. Another question (from elsewhere in the thread) is if it is worth, in the Ironic notifications, to try and cook up something generic or to just carry on with what's being used. This feels like something that we should be thinking about with an eye to the K* cycle - would you agree? Yup. Thanks for helping to tease this all out and provide some direction on where to go next. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [qa] Getting rolling with javelin2
A recent thread about javelin2[1] ended with "the takeaway here is that we should consider javelin2 still very much a WIP...we should be hesitant to add new features and functionality to it." One of the items on my todo list is to add some functionality (for ceilometer[2]) to javelin2, therefore I'd like to help in whatever way to make it more mature and useful and move it along. To that end I have some questions: * I understand that javelin2 is or will be run as part of Grenade. Where (what code) do I look to see that integration? If it hasn't happened yet, where will it happen? Will that integration be done as if javelin2 is part of a test suite or will a bit of shell code be wrapping it and checking exit codes? * Will grenade provide its own resources.yaml or is the plan to use the one currently found in the tempest repo? Basically I'd like to see this move along and I'm happy to do the leg work to make it so, but I need a bit of guidance on where to push. Thanks. [1] http://lists.openstack.org/pipermail/openstack-dev/2014-July/039078.html [2] The TC did some gap analysis and one of the areas that needs work is in resource survivability. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
me and have confidence they are doing it right. Can you start fleshing out exactly what you mean by a standard not necessarily predisposed to ceilometer, still sufficiently close to eliminate the need for a mapping layer on the ceilometer side? That's what I'm trying to do here in this discussion here on this list because as far as I'm able to tell that's the one way to ensure that the idea has some validity and I'm not missing important details. I thought I was saying "how about this general idea, any merit? Help me flesh it out?" but apparently it came across as something different. That's a shame. If I were made of less stern stuff I might be put off trying to share ideas and make improvements: the hoops I'm leaping through are making me a feel bit "whoa, man, really I don't want to have to try this hard just to discuss an idea". And then please run this by the other stake-holders (e.g. StackTach) for a sanity check to confirm that it would complicate their lives unnecessarily. I would have thought that was what I was already doing by posting in this thread based on the title and the participants thus far, but I'm assuming you mean something else? I don't think so, it was supposed to be an example of the cost of there not being an existing general standard. If there were such a standard I wouldn't have had to write any code, only the Ironic folk would and I would have had the free time to help them. Less code == good! Yes less code is great as long as we don't sacrifice the flexibility that we and other consumers need. This ("flexibility") is a topic for a different conversation, I don't want to throw us off this thread. Similarly if there are individual schema for the various notification types, every time someone wants to make a new notification they need to get that schema validated and various agents and actors need to be informed of its existence and then the new thing needs to be integrated. That is limiting. If the contract can be managed at a higher layer of abstraction more agents and actors can contribute. I don't really know what a "higher layer of abstraction" actually means when we talk about schema. More generic, less specific? Something else? More generic and less specific. The scheme is for NotificationEvent, not ComputeInstanceCreateStartNotificationEvent. OK just being devil's advocate for a second, it sounds like a truism to state we wouldn't need mapping logic if the mapping was done elsewhere. Of course. The reason for doing that is because the publisher should be the source of authority on what the mapping from specific-thing-that-happened-here to NotificationEvent, not Ceilometer. If it is Ceilometer then it means that StackTach must also keep its own mapping. And so does every other consumer. The general provision here is to get more DRY about information authority. If you can figure out how to do that in a concrete way ... then great, I'd be receptive, let's hear some proposals. I've made a very limited one, how's it sound? Does it need an implementation in order to warrant further discussion? Or would it be better to toss it around a bit more? It makes no sense to me to formalize something that has no potential legs. Thanks for sticking through this. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On Fri, 11 Jul 2014, Lucas Alvares Gomes wrote: The data format that Ironic will send was part of the spec proposed and could have been reviewed. I think there's still time to change it tho, if you have a better format talk to Haomeng which is the guys responsible for that work in Ironic and see if he can change it (We can put up a following patch to fix the spec with the new format as well) . But we need to do this ASAP because we want to get it landed in Ironic soon. It was only after doing the work that I realized how it might be an example for the sake of this discussion. As the architecure of Ceilometer currently exist there still needs to be some measure of custom code, even if the notifications are as I described them. However, if we want to take this opportunity to move some of the smarts from Ceilomer into the Ironic code then the paste that I created might be a guide to make it possible: http://paste.openstack.org/show/86071/ However on that however, if there's some chance that a large change could happen, it might be better to wait, I don't know. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
;t have had to write any code, only the Ironic folk would and I would have had the free time to help them. Less code == good! Similarly if there are individual schema for the various notification types, every time someone wants to make a new notification they need to get that schema validated and various agents and actors need to be informed of its existence and then the new thing needs to be integrated. That is limiting. If the contract can be managed at a higher layer of abstraction more agents and actors can contribute. i.e. not that ceilometer requires some translation to be done, but that this translation must be hand-craft in Python code as opposed to being driven declaratively via some configured mapping rules? I don't think there needs to be much, if any, mapping on the consumption side of the notification process if there is a standard form in which those notifications are emitted. In those cases where pipeline transformation needs to be done ( multiple value gathering) the pipeline can consume certain notifications and then emit more as the result of the transformation. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
andard notification payload format would of course mean change, but we know that flexible metering/auditing is very important for the OpenStack universe. Your argument seems to be that having such a standard, predisposed to ceilometer, would limit flexibility and lose capability. I'm suggesting that it would (after the hump of admittedly quite a lot of work) increase flexiblity and save resources for focusing on other capabilities (actually using the gathered data to do interesting things). Currently Ceilometer is required to know far too much about the notifications it receives and that knowledge is being represented is code. That is a BadThing™. I'm sure there are plenty of reasons for why it has turned out that way, but if there is an opportunity for change...? Not really, I don't think, TBH. Take my example from above, the processing of Ironic notifications. I think it is weird that I had to write code for that. Does it not seem odd to you? If we had a standard format, we could have said, in response to the initial proposal, "looks good but if you make the data look like _this_ that would be totally wicked sweet!" -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] fastest way to run individual tests ?
On Thu, 10 Jul 2014, Mike Bayer wrote: I typically never use tox or testtools at the commandline until I'm ready to commit and want to see what the jenkins builds will see. I start up the whole thing and then it's time to take a break while it reinvents the whole world. Me tool I've been squeezing py.test into my testing as well. It allows me to do TDD off the test file of the thing that I'm creating or changing and focus on just that without the incredibly long round trip time for tox and friends. I do some variation on: py.test -svx path/to/test/file.py with a pre-warmed virtualenv. My next hope is to get rid of unittest and just do the plain asserts that py.test makes so nice and lovely. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On Thu, 10 Jul 2014, Julien Danjou wrote: My initial plan was to leverage a library like voluptuous to do schema based validation on the sender side. That would allow for receiver to introspect schema and know the data structure to expect. I didn't think deeply on how to handle versioning, but that should be doable too. It's not clear to me in this discussion what it is that is being versioned, contracted or standardized. Is it each of the many different notifications that various services produce now? Is it the general concept of a notification which can be considered a "sample" that something like Ceilometer or StackTack might like to consume? If it not the latter, why isn't it the latter? Here's some semi-random noodling: Wouldn't the metering process be a lot easier if there was a standardized package for a "sample" and anyone with the proper credentials could drop a sample on the bus with the right exchange with the right topic and anything (e.g. Ceilometer, StackTack, the NewShinyMeteringShiz) that wants to consider itself a metering store can consume it and hey presto. If people are going to have to write a bunch of new tests and related code to get notifications healthier why not make notifications for metrics _healthy_ and available to any system without needing to write a bunch of code on both sides of the bus. Currently Ceilometer is required to know far too much about the notifications it receives and that knowledge is being represented is code. That is a BadThing™. I'm sure there are plenty of reasons for why it has turned out that way, but if there is an opportunity for change...? -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [qa] issues adding functionality to javelin2
On Tue, 1 Jul 2014, Matthew Treinish wrote: In the mean-time you can easily re-enable the full tracebacks by setting both verbose and debug logging in the tempest config file. Is there a way to say, via config, "no, I really do want exceptions to cause the code to exit and pooh on the console"? Second thing: When run as above the path to image files is insufficient for them to be loaded. I overcame this by hardcoding a BASENAME (see my review in progress[1]). Note that because of the swallowed exceptions you can run (in create or check) and not realize that no image files were found. The code silently exits. Why? Looking at the code if you use a full path for the image location in the yaml file it should just call open() on it. I can see an issue if you're using relative paths in the yaml, which I think is probably the problem. Sure, but the resources.yaml file is presented as if it is canonical, and in that canonical form it can't work. Presumably it should either work or state that it can't work. Especially since it doesn't let you know that it didn't work when it doesn't work (because of the swallowed errors). So I think the takeaway here is that we should consider javelin2 still very much a WIP. It's only been a few weeks since it was initially merged and it's still not stable enough so that we can gate on it. Until we are running it in some fashion as part of normal gating then we should be hesitant to add new features and functionality to it. I'll press pause on the ceilometer stuff for a while. Thanks for the quick response. I just wanted to make sure I wasn't crazy. I guess not, at least not because of this stuff. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [qa] issues adding functionality to javelin2
I've been working to add ceilometer checks in javelin2. Doing so has revealed some issues that appear to be a fairly big deal but I suppose there's some chance I'm doing things completely wrong. For reference my experiments are being done with a devstack with ceilometer enabled, running javelin as: python tempest/cmd/javelin.py -m check \ -r tempest/cmd/resources.yaml replace "check" with "create" as required. First thing I noticed: setting sys.excepthook in setup() in tempest/openstack/common/log.py is causing exceptions to be swallowed such that when making simple runs it is not obvious that things have gone wrong. You can check $? and then look in templest.log but the content of tempest.log is just the exception message, not its type nor any traceback. If you wish to follow along at home comment out line 427 in tempest/openstack/common/log.py. Second thing: When run as above the path to image files is insufficient for them to be loaded. I overcame this by hardcoding a BASENAME (see my review in progress[1]). Note that because of the swallowed exceptions you can run (in create or check) and not realize that no image files were found. The code silently exits. Third thing: Much of the above could still work if there were a different resources.yaml or the PWD was set specifically for test runs. However, this patchset[2] adds support for checking creating and attaching volumes. Assuming it is expected to use the the volumes API under tempest/services some of the calls are being made with the wrong number of arguments (client.volumes.{create_volume,atach_volume}). Again these errors aren't obvious because the exceptions are swallowed. I can provide fixes for all this stuff but I wanted to first confirm that I'm not doing something incorrectly or missing something obvious. Some questions: * When javelin will be run officially as part of the tests, what is the PWD, such that we can create an accurate path to the image files? * Is the exception swallowing intentional? * When run in grenade will javelin have any knowledge of whether the current check run is happening before or after the upgrade stage? Thanks for any help and input. I'm on IRC as cdent if you want to find me there rather than respond here. [1] https://review.openstack.org/#/c/102354/ [2] https://review.openstack.org/#/c/100105/ -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [ceilometer] [qa] testing ceilometer in javelin2
I've start a spec for using javelin2 with ceilometer to test resource survival: https://review.openstack.org/#/c/100575 As far as I can tell from the current form of javelin2[1] is explicit about a mapping between resources and clients to create and check the existence of those resources. This doesn't map perfectly to what we'd like to do with ceilometer which is, in short, to make two time-bounded api queries before and after the upgrade and confirm that the results are sane[2]. Seems like there are two ways this could be added to javelin2: * Write a ceilo specific check_ceilomethod() on JavelinCheck. * Write a generic web request handler of some kind such that resources.yaml could represent requests and expected responses. The latter seems most flexible but is also presumably a PITA with regard to auth and handling responses in a flexible fashion. What's the recommended path to make this happen? Thanks. [1] https://github.com/openstack/tempest/blob/master/tempest/cmd/javelin.py [2] Definition of "sane" to be determined. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Gate proposal - drop Postgresql configurations in the gate
On Fri, 13 Jun 2014, Sean Dague wrote: So if we can't evolve the system back towards health, we need to just cut a bunch of stuff off until we can. +1 This is kind of the crux of the biscuit. As things stand there's so much noise that it's far too easy to think and act like it is somebody else's problem. -- Chris Dent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev