TL;DR:  I consider the poor state of log consistency a major impediment for 
more widespread adoption of OpenStack and would like to volunteer to own this 
cross-functional process to begin to unify and standardize logging messages and 
attributes for Kilo while dealing with the most egregious issues as the 
community identifies them.



Recap from some mail threads:



>From Sean Dague on Kilo cycle goals:

2. Consistency in southbound interfaces (Logging first)



Logging and notifications are south bound interfaces from OpenStack providing 
information to people, or machines, about what is going on.

There is also a 3rd proposed south bound with osprofiler.



For Kilo: I think it's reasonable to complete the logging standards and 
implement them. I expect notifications (which haven't quite kicked off) are 
going to take 2 cycles.



I'd honestly *really* love to see a unification path for all the the southbound 
parts, logging, osprofiler, notifications, because there is quite a bit of 
overlap in the instrumentation/annotation inside the main code for all of these.


And from Doug Hellmann:
1. Sean has done a lot of analysis and started a spec on standardizing logging 
guidelines where he is gathering input from developers, deployers, and 
operators [1]. Because it is far enough for us to see real progress, it's a 
good place for us to start experimenting with how to drive cross-project 
initiatives involving code and policy changes from outside of a single project. 
We have a couple of potentially related specs in Oslo as part of the oslo.log 
graduation work [2] [3], but I think most of the work will be within the 
applications.

[1] https://review.openstack.org/#/c/91446/
[2] 
https://blueprints.launchpad.net/oslo.log/+spec/app-agnostic-logging-parameters
[3] https://blueprints.launchpad.net/oslo.log/+spec/remove-context-adapter



And from James Blair:

1) Improve log correlation and utility



If we're going to improve the stability of OpenStack, we have to be able to 
understand what's going on when it breaks.  That's both true as developers when 
we're trying to diagnose a failure in an integration test, and it's true for 
operators who are all too often diagnosing the same failure in a real 
deployment.  Consistency in logging across projects as well as a cross-project 
request token would go a long way toward this.

While I am not currently managing an OpenStack deployment, writing tests or 
code, or debugging the stack, I have spent many years doing just that.  Through 
QA, Ops and Customer support, I have come to revel in good logging and log 
messages and curse the holes and vagaries in many systems.

Defining/refining logs to be useful and usable is a cross-functional effort 
that needs to include:

·         Operators

·         QA

·         End Users

·         Community managers

·         Tech Pubs

·         Translators

·         Developers

·         TC (which provides the forum and impetus for all the projects to 
cooperate on this)

At the moment, I think this effort may best work under the auspices of Oslo 
(oslo.log), I'd love to hear other proposals.

Here is the beginnings of my proposal of how to attack and subdue the painful 
state of logs:


·         Post this email to the MLs (dev, ops, enduser) to get feedback, 
garner support and participants in the process
(Done;-)

·         In parallel:

o   Collect up problems, issues, ideas, solutions on an etherpad 
https://etherpad.openstack.org/p/Log-Rationalization where anyone in the 
communities can post.

o   Categorize  reported Log issues into classes (already identified classes):

§  Format Consistency across projects

§  Log level definition and categorization across classes

§  Time syncing entries across tens of logfiles

§  Relevancy/usefulness of information provided within messages

§  Etc (missing a lot here, but I'm sure folks will speak up)

o   Analyze existing log message formats, standards across integrated projects

o   File bugs where issues identified are actual project bugs

o   Build a session outline for F2F working session at the Paris Design Summit

·         At the Paris Design Summit, use a session and/or pod discussions to 
set priorities, recruit contributors, start and/or flesh out specs and 
blueprints

·         Proceed according to priorities, specs, blueprints, contributions and 
changes as needed as the work progresses.

·         Keep an active and open rapport and reporting process for the user 
community to comment and participate in the processes.
Measures of success:

·         Log messages provide consistency of format enough for productive 
mining through operator writable scripts

·         Problem debugging is simplified through the ability to trust 
timestamps across all OpenStack logs (and use scripts to get to the time you 
want in any/all of the logfiles)

·         Standards for format, content, levels and translations have been 
proposed and agreed to be adopted across all OpenStack integrated projects

·         The user communities demonstrate an increased level of trust and 
decreased level of frustration with OpenStack logging (surveys, bug reports, 
other measures?)

·         The log team can disband

I expect that getting the logs in very good shape will take more than just the 
Kilo timeframe, but once momentum has built, which should happened during Kilo, 
the process should move very quickly.  A lot of this could be handled through 
"while you're in there" or "low hanging fruit" once the standards are 
established.  The bigger win will be if we can ensure what we define/design is 
extensible over the longer life of OpenStack.

--Rocky

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to