Thanks for starting this, Joe. I think that we need to address the operator and user experience by improving the consistency and stability of OpenStack overall. Here are five ways of doing that:
1) Improve log correlation and utility If we're going to improve the stability of OpenStack, we have to be able to understand what's going on when it breaks. That's both true as developers when we're trying to diagnose a failure in an integration test, and it's true for operators who are all too often diagnosing the same failure in a real deployment. Consistency in logging across projects as well as a cross-project request token would go a long way toward this. 2) Improve API consistency As projects are becoming more integrated (which is happening at least partially as we move functionality _out_ of previously monolithic projects), the API between them becomes more important. We keep generating APIs with different expectations that behave in very different ways across projects. We need to standardize on API behavior and expectations, for the sake of developers of OpenStack who are increasingly using them internally, but even moreso for our users who expect a single API and are bewildered when they get dozens instead. 3) A real SDK OpenStack is so nearly impossible to use, that we have a substantial amount of code in the infrastructure program to do things that, frankly, we are a bit surprised that the client libraries don't do. Just getting an instance with an IP address is an enormous challenge, and something that took us years to get right. We still have problems deleting instances. We need client libraries (an SDK if you will) and command line clients that are easy for users to understand and work with, and hide the gory details of how the sausage is made. In OpenStack, we have chosen to let a thousand flowers bloom and deployers have a wide array of implementation options available. However, it's unreasonable to expect all of our users to understand all of the implications of all of those choices. Our SDK must help users deal with that complexity. 4) Reliability Parts of OpenStack break all the time. In general, we accept that the environment a cloud operates in can be unreliable (we design for failure). However, that should be the exception, not the norm. Our current failure modes and rates are hurting everyone -- developers merging changes in the gate, operators in continual fire-fighting mode, and users who have to handle and recover from every kind of internal error that OpenStack externalizes. We need to focus on making OpenStack itself operate reliably. 5) Functional testing We've hit the limit of what we can reasonably accomplish by putting all of our testing efforts into cross-project integration testing. Instead, we need to functionally test individual projects much more strongly, so that we can reserve integration testing (which is much more complicated) for catching real "integration" bugs rather than expecting it to call all functional bugs. To that end, we should help projects focus on robust functional testing in the Kilo cycle. -Jim _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev