Re: [openstack-dev] Log Rationalization -- Bring it on!
On Sep 17, 2014, at 7:42 PM, Rochelle.RochelleGrober rochelle.gro...@huawei.com wrote: TL;DR: I consider the poor state of log consistency a major impediment for more widespread adoption of OpenStack and would like to volunteer to own this cross-functional process to begin to unify and standardize logging messages and attributes for Kilo while dealing with the most egregious issues as the community identifies them. Recap from some mail threads: From Sean Dague on Kilo cycle goals: 2. Consistency in southbound interfaces (Logging first) Logging and notifications are south bound interfaces from OpenStack providing information to people, or machines, about what is going on. There is also a 3rd proposed south bound with osprofiler. For Kilo: I think it's reasonable to complete the logging standards and implement them. I expect notifications (which haven't quite kicked off) are going to take 2 cycles. I'd honestly *really* love to see a unification path for all the the southbound parts, logging, osprofiler, notifications, because there is quite a bit of overlap in the instrumentation/annotation inside the main code for all of these. And from Doug Hellmann: 1. Sean has done a lot of analysis and started a spec on standardizing logging guidelines where he is gathering input from developers, deployers, and operators [1]. Because it is far enough for us to see real progress, it’s a good place for us to start experimenting with how to drive cross-project initiatives involving code and policy changes from outside of a single project. We have a couple of potentially related specs in Oslo as part of the oslo.log graduation work [2] [3], but I think most of the work will be within the applications. [1] https://review.openstack.org/#/c/91446/ [2] https://blueprints.launchpad.net/oslo.log/+spec/app-agnostic-logging-parameters [3] https://blueprints.launchpad.net/oslo.log/+spec/remove-context-adapter And from James Blair: 1) Improve log correlation and utility If we're going to improve the stability of OpenStack, we have to be able to understand what's going on when it breaks. That's both true as developers when we're trying to diagnose a failure in an integration test, and it's true for operators who are all too often diagnosing the same failure in a real deployment. Consistency in logging across projects as well as a cross-project request token would go a long way toward this. While I am not currently managing an OpenStack deployment, writing tests or code, or debugging the stack, I have spent many years doing just that. Through QA, Ops and Customer support, I have come to revel in good logging and log messages and curse the holes and vagaries in many systems. Defining/refining logs to be useful and usable is a cross-functional effort that needs to include: · Operators · QA · End Users · Community managers · Tech Pubs · Translators · Developers · TC (which provides the forum and impetus for all the projects to cooperate on this) At the moment, I think this effort may best work under the auspices of Oslo (oslo.log), I’d love to hear other proposals. I’m sure there will be changes to make in the log library. However, because of the cross-project nature of the policy decisions, I think we should drive this from outside of Oslo. We can use the oslo.log developer docs as a place to formally document guidelines, and we can change the library to make it easier to follow those guidelines, but the specs to define the guidelines and the planning for rolling out the changes should happen in a more central place than oslo-specs. Here is the beginnings of my proposal of how to attack and subdue the painful state of logs: · Post this email to the MLs (dev, ops, enduser) to get feedback, garner support and participants in the process (Done;-) FWIW, I’m only replying on the -dev list to avoid duplicate message from cross-posting. Figuring out how to gather input and collect it is one of the procedural issues we need to work out as part of starting an initiative like this. I like that you’ve started an etherpad for that. We really do need to have the meta conversation about running cross-project initiatives, and I think this one has enough clear support that we could have that discussion without being side-tracked by what the initiative is trying to accomplish. Doug · In parallel: o Collect up problems, issues, ideas, solutions on an etherpad https://etherpad.openstack.org/p/Log-Rationalization where anyone in the communities can post. o Categorize reported Log issues into classes (already identified classes): § Format Consistency across projects § Log level definition and categorization across classes § Time syncing entries across tens of logfiles § Relevancy/usefulness of
Re: [openstack-dev] Log Rationalization -- Bring it on!
On 09/17/2014 08:48 PM, John Dickinson wrote: On Sep 17, 2014, at 8:43 PM, Jay Faulkner j...@jvf.cc wrote: Comments inline. -Original Message- From: Monty Taylor [mailto:mord...@inaugust.com] Sent: Wednesday, September 17, 2014 7:34 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] Log Rationalization -- Bring it on! On 09/17/2014 04:42 PM, Rochelle.RochelleGrober wrote: TL;DR: I consider the poor state of log consistency a major impediment for more widespread adoption of OpenStack and would like to volunteer to own this cross-functional process to begin to unify and standardize logging messages and attributes for Kilo while dealing with the most egregious issues as the community identifies them. I fully support this, and I, for one, welcome our new log-standardization overlords. Something that could be interesting is to see if we can emit metrics everytime a loggable event happens. There's already a spec+code being drafted for Ironic in Kilo (https://review.openstack.org/#/c/100729/ https://review.openstack.org/#/c/103202/) that we're using downstream to emit metrics from Ironic. You may be interested to see how Swift has integrated StatsD events into a log adapter. https://github.com/openstack/swift/blob/master/swift/common/utils.py#L1197 See also the StatsdClient class in that same file. +1000 I'd so far as to say that everything should really have statsd instrumentation. If we have good organization of logging events, and levels, perhaps there's possibly a way to make it easy for metrics to be emitted at that time as well. - Jay Faulkner Recap from some mail threads: From Sean Dague on Kilo cycle goals: 2. Consistency in southbound interfaces (Logging first) Logging and notifications are south bound interfaces from OpenStack providing information to people, or machines, about what is going on. There is also a 3rd proposed south bound with osprofiler. For Kilo: I think it's reasonable to complete the logging standards and implement them. I expect notifications (which haven't quite kicked off) are going to take 2 cycles. I'd honestly *really* love to see a unification path for all the the southbound parts, logging, osprofiler, notifications, because there is quite a bit of overlap in the instrumentation/annotation inside the main code for all of these. And from Doug Hellmann: 1. Sean has done a lot of analysis and started a spec on standardizing logging guidelines where he is gathering input from developers, deployers, and operators [1]. Because it is far enough for us to see real progress, it's a good place for us to start experimenting with how to drive cross-project initiatives involving code and policy changes from outside of a single project. We have a couple of potentially related specs in Oslo as part of the oslo.log graduation work [2] [3], but I think most of the work will be within the applications. [1] https://review.openstack.org/#/c/91446/ [2] https://blueprints.launchpad.net/oslo.log/+spec/app-agnostic-logging-p arameters [3] https://blueprints.launchpad.net/oslo.log/+spec/remove-context- adapter And from James Blair: 1) Improve log correlation and utility If we're going to improve the stability of OpenStack, we have to be able to understand what's going on when it breaks. That's both true as developers when we're trying to diagnose a failure in an integration test, and it's true for operators who are all too often diagnosing the same failure in a real deployment. Consistency in logging across projects as well as a cross-project request token would go a long way toward this. While I am not currently managing an OpenStack deployment, writing tests or code, or debugging the stack, I have spent many years doing just that. Through QA, Ops and Customer support, I have come to revel in good logging and log messages and curse the holes and vagaries in many systems. Defining/refining logs to be useful and usable is a cross-functional effort that needs to include: · Operators · QA · End Users · Community managers · Tech Pubs · Translators · Developers · TC (which provides the forum and impetus for all the projects to cooperate on this) At the moment, I think this effort may best work under the auspices of Oslo (oslo.log), I'd love to hear other proposals. Here is the beginnings of my proposal of how to attack and subdue the painful state of logs: · Post this email to the MLs (dev, ops, enduser) to get feedback, garner support and participants in the process (Done;-) · In parallel: o Collect up problems, issues, ideas, solutions on an etherpad https://etherpad.openstack.org/p/Log-Rationalization where anyone in the communities can post. o Categorize reported Log issues into classes (already identified classes
[openstack-dev] Log Rationalization -- Bring it on!
TL;DR: I consider the poor state of log consistency a major impediment for more widespread adoption of OpenStack and would like to volunteer to own this cross-functional process to begin to unify and standardize logging messages and attributes for Kilo while dealing with the most egregious issues as the community identifies them. Recap from some mail threads: From Sean Dague on Kilo cycle goals: 2. Consistency in southbound interfaces (Logging first) Logging and notifications are south bound interfaces from OpenStack providing information to people, or machines, about what is going on. There is also a 3rd proposed south bound with osprofiler. For Kilo: I think it's reasonable to complete the logging standards and implement them. I expect notifications (which haven't quite kicked off) are going to take 2 cycles. I'd honestly *really* love to see a unification path for all the the southbound parts, logging, osprofiler, notifications, because there is quite a bit of overlap in the instrumentation/annotation inside the main code for all of these. And from Doug Hellmann: 1. Sean has done a lot of analysis and started a spec on standardizing logging guidelines where he is gathering input from developers, deployers, and operators [1]. Because it is far enough for us to see real progress, it's a good place for us to start experimenting with how to drive cross-project initiatives involving code and policy changes from outside of a single project. We have a couple of potentially related specs in Oslo as part of the oslo.log graduation work [2] [3], but I think most of the work will be within the applications. [1] https://review.openstack.org/#/c/91446/ [2] https://blueprints.launchpad.net/oslo.log/+spec/app-agnostic-logging-parameters [3] https://blueprints.launchpad.net/oslo.log/+spec/remove-context-adapter And from James Blair: 1) Improve log correlation and utility If we're going to improve the stability of OpenStack, we have to be able to understand what's going on when it breaks. That's both true as developers when we're trying to diagnose a failure in an integration test, and it's true for operators who are all too often diagnosing the same failure in a real deployment. Consistency in logging across projects as well as a cross-project request token would go a long way toward this. While I am not currently managing an OpenStack deployment, writing tests or code, or debugging the stack, I have spent many years doing just that. Through QA, Ops and Customer support, I have come to revel in good logging and log messages and curse the holes and vagaries in many systems. Defining/refining logs to be useful and usable is a cross-functional effort that needs to include: · Operators · QA · End Users · Community managers · Tech Pubs · Translators · Developers · TC (which provides the forum and impetus for all the projects to cooperate on this) At the moment, I think this effort may best work under the auspices of Oslo (oslo.log), I'd love to hear other proposals. Here is the beginnings of my proposal of how to attack and subdue the painful state of logs: · Post this email to the MLs (dev, ops, enduser) to get feedback, garner support and participants in the process (Done;-) · In parallel: o Collect up problems, issues, ideas, solutions on an etherpad https://etherpad.openstack.org/p/Log-Rationalization where anyone in the communities can post. o Categorize reported Log issues into classes (already identified classes): § Format Consistency across projects § Log level definition and categorization across classes § Time syncing entries across tens of logfiles § Relevancy/usefulness of information provided within messages § Etc (missing a lot here, but I'm sure folks will speak up) o Analyze existing log message formats, standards across integrated projects o File bugs where issues identified are actual project bugs o Build a session outline for F2F working session at the Paris Design Summit · At the Paris Design Summit, use a session and/or pod discussions to set priorities, recruit contributors, start and/or flesh out specs and blueprints · Proceed according to priorities, specs, blueprints, contributions and changes as needed as the work progresses. · Keep an active and open rapport and reporting process for the user community to comment and participate in the processes. Measures of success: · Log messages provide consistency of format enough for productive mining through operator writable scripts · Problem debugging is simplified through the ability to trust timestamps across all OpenStack logs (and use scripts to get to the time you want in any/all of the logfiles) · Standards for format, content, levels and translations have been proposed and agreed to be adopted across all
Re: [openstack-dev] Log Rationalization -- Bring it on!
On 09/17/2014 04:42 PM, Rochelle.RochelleGrober wrote: TL;DR: I consider the poor state of log consistency a major impediment for more widespread adoption of OpenStack and would like to volunteer to own this cross-functional process to begin to unify and standardize logging messages and attributes for Kilo while dealing with the most egregious issues as the community identifies them. I fully support this, and I, for one, welcome our new log-standardization overlords. Recap from some mail threads: From Sean Dague on Kilo cycle goals: 2. Consistency in southbound interfaces (Logging first) Logging and notifications are south bound interfaces from OpenStack providing information to people, or machines, about what is going on. There is also a 3rd proposed south bound with osprofiler. For Kilo: I think it's reasonable to complete the logging standards and implement them. I expect notifications (which haven't quite kicked off) are going to take 2 cycles. I'd honestly *really* love to see a unification path for all the the southbound parts, logging, osprofiler, notifications, because there is quite a bit of overlap in the instrumentation/annotation inside the main code for all of these. And from Doug Hellmann: 1. Sean has done a lot of analysis and started a spec on standardizing logging guidelines where he is gathering input from developers, deployers, and operators [1]. Because it is far enough for us to see real progress, it's a good place for us to start experimenting with how to drive cross-project initiatives involving code and policy changes from outside of a single project. We have a couple of potentially related specs in Oslo as part of the oslo.log graduation work [2] [3], but I think most of the work will be within the applications. [1] https://review.openstack.org/#/c/91446/ [2] https://blueprints.launchpad.net/oslo.log/+spec/app-agnostic-logging-parameters [3] https://blueprints.launchpad.net/oslo.log/+spec/remove-context-adapter And from James Blair: 1) Improve log correlation and utility If we're going to improve the stability of OpenStack, we have to be able to understand what's going on when it breaks. That's both true as developers when we're trying to diagnose a failure in an integration test, and it's true for operators who are all too often diagnosing the same failure in a real deployment. Consistency in logging across projects as well as a cross-project request token would go a long way toward this. While I am not currently managing an OpenStack deployment, writing tests or code, or debugging the stack, I have spent many years doing just that. Through QA, Ops and Customer support, I have come to revel in good logging and log messages and curse the holes and vagaries in many systems. Defining/refining logs to be useful and usable is a cross-functional effort that needs to include: · Operators · QA · End Users · Community managers · Tech Pubs · Translators · Developers · TC (which provides the forum and impetus for all the projects to cooperate on this) At the moment, I think this effort may best work under the auspices of Oslo (oslo.log), I'd love to hear other proposals. Here is the beginnings of my proposal of how to attack and subdue the painful state of logs: · Post this email to the MLs (dev, ops, enduser) to get feedback, garner support and participants in the process (Done;-) · In parallel: o Collect up problems, issues, ideas, solutions on an etherpad https://etherpad.openstack.org/p/Log-Rationalization where anyone in the communities can post. o Categorize reported Log issues into classes (already identified classes): § Format Consistency across projects § Log level definition and categorization across classes § Time syncing entries across tens of logfiles § Relevancy/usefulness of information provided within messages § Etc (missing a lot here, but I'm sure folks will speak up) o Analyze existing log message formats, standards across integrated projects o File bugs where issues identified are actual project bugs o Build a session outline for F2F working session at the Paris Design Summit · At the Paris Design Summit, use a session and/or pod discussions to set priorities, recruit contributors, start and/or flesh out specs and blueprints · Proceed according to priorities, specs, blueprints, contributions and changes as needed as the work progresses. · Keep an active and open rapport and reporting process for the user community to comment and participate in the processes. Measures of success: · Log messages provide consistency of format enough for productive mining through operator writable scripts · Problem debugging is simplified through the
Re: [openstack-dev] Log Rationalization -- Bring it on!
Comments inline. -Original Message- From: Monty Taylor [mailto:mord...@inaugust.com] Sent: Wednesday, September 17, 2014 7:34 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] Log Rationalization -- Bring it on! On 09/17/2014 04:42 PM, Rochelle.RochelleGrober wrote: TL;DR: I consider the poor state of log consistency a major impediment for more widespread adoption of OpenStack and would like to volunteer to own this cross-functional process to begin to unify and standardize logging messages and attributes for Kilo while dealing with the most egregious issues as the community identifies them. I fully support this, and I, for one, welcome our new log-standardization overlords. Something that could be interesting is to see if we can emit metrics everytime a loggable event happens. There's already a spec+code being drafted for Ironic in Kilo (https://review.openstack.org/#/c/100729/ https://review.openstack.org/#/c/103202/) that we're using downstream to emit metrics from Ironic. If we have good organization of logging events, and levels, perhaps there's possibly a way to make it easy for metrics to be emitted at that time as well. - Jay Faulkner Recap from some mail threads: From Sean Dague on Kilo cycle goals: 2. Consistency in southbound interfaces (Logging first) Logging and notifications are south bound interfaces from OpenStack providing information to people, or machines, about what is going on. There is also a 3rd proposed south bound with osprofiler. For Kilo: I think it's reasonable to complete the logging standards and implement them. I expect notifications (which haven't quite kicked off) are going to take 2 cycles. I'd honestly *really* love to see a unification path for all the the southbound parts, logging, osprofiler, notifications, because there is quite a bit of overlap in the instrumentation/annotation inside the main code for all of these. And from Doug Hellmann: 1. Sean has done a lot of analysis and started a spec on standardizing logging guidelines where he is gathering input from developers, deployers, and operators [1]. Because it is far enough for us to see real progress, it's a good place for us to start experimenting with how to drive cross-project initiatives involving code and policy changes from outside of a single project. We have a couple of potentially related specs in Oslo as part of the oslo.log graduation work [2] [3], but I think most of the work will be within the applications. [1] https://review.openstack.org/#/c/91446/ [2] https://blueprints.launchpad.net/oslo.log/+spec/app-agnostic-logging-p arameters [3] https://blueprints.launchpad.net/oslo.log/+spec/remove-context- adapter And from James Blair: 1) Improve log correlation and utility If we're going to improve the stability of OpenStack, we have to be able to understand what's going on when it breaks. That's both true as developers when we're trying to diagnose a failure in an integration test, and it's true for operators who are all too often diagnosing the same failure in a real deployment. Consistency in logging across projects as well as a cross-project request token would go a long way toward this. While I am not currently managing an OpenStack deployment, writing tests or code, or debugging the stack, I have spent many years doing just that. Through QA, Ops and Customer support, I have come to revel in good logging and log messages and curse the holes and vagaries in many systems. Defining/refining logs to be useful and usable is a cross-functional effort that needs to include: · Operators · QA · End Users · Community managers · Tech Pubs · Translators · Developers · TC (which provides the forum and impetus for all the projects to cooperate on this) At the moment, I think this effort may best work under the auspices of Oslo (oslo.log), I'd love to hear other proposals. Here is the beginnings of my proposal of how to attack and subdue the painful state of logs: · Post this email to the MLs (dev, ops, enduser) to get feedback, garner support and participants in the process (Done;-) · In parallel: o Collect up problems, issues, ideas, solutions on an etherpad https://etherpad.openstack.org/p/Log-Rationalization where anyone in the communities can post. o Categorize reported Log issues into classes (already identified classes): § Format Consistency across projects § Log level definition and categorization across classes § Time syncing entries across tens of logfiles § Relevancy/usefulness of information provided within messages § Etc (missing a lot here, but I'm sure folks will speak up) o Analyze existing log message
Re: [openstack-dev] Log Rationalization -- Bring it on!
On Sep 17, 2014, at 8:43 PM, Jay Faulkner j...@jvf.cc wrote: Comments inline. -Original Message- From: Monty Taylor [mailto:mord...@inaugust.com] Sent: Wednesday, September 17, 2014 7:34 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] Log Rationalization -- Bring it on! On 09/17/2014 04:42 PM, Rochelle.RochelleGrober wrote: TL;DR: I consider the poor state of log consistency a major impediment for more widespread adoption of OpenStack and would like to volunteer to own this cross-functional process to begin to unify and standardize logging messages and attributes for Kilo while dealing with the most egregious issues as the community identifies them. I fully support this, and I, for one, welcome our new log-standardization overlords. Something that could be interesting is to see if we can emit metrics everytime a loggable event happens. There's already a spec+code being drafted for Ironic in Kilo (https://review.openstack.org/#/c/100729/ https://review.openstack.org/#/c/103202/) that we're using downstream to emit metrics from Ironic. You may be interested to see how Swift has integrated StatsD events into a log adapter. https://github.com/openstack/swift/blob/master/swift/common/utils.py#L1197 See also the StatsdClient class in that same file. --John If we have good organization of logging events, and levels, perhaps there's possibly a way to make it easy for metrics to be emitted at that time as well. - Jay Faulkner Recap from some mail threads: From Sean Dague on Kilo cycle goals: 2. Consistency in southbound interfaces (Logging first) Logging and notifications are south bound interfaces from OpenStack providing information to people, or machines, about what is going on. There is also a 3rd proposed south bound with osprofiler. For Kilo: I think it's reasonable to complete the logging standards and implement them. I expect notifications (which haven't quite kicked off) are going to take 2 cycles. I'd honestly *really* love to see a unification path for all the the southbound parts, logging, osprofiler, notifications, because there is quite a bit of overlap in the instrumentation/annotation inside the main code for all of these. And from Doug Hellmann: 1. Sean has done a lot of analysis and started a spec on standardizing logging guidelines where he is gathering input from developers, deployers, and operators [1]. Because it is far enough for us to see real progress, it's a good place for us to start experimenting with how to drive cross-project initiatives involving code and policy changes from outside of a single project. We have a couple of potentially related specs in Oslo as part of the oslo.log graduation work [2] [3], but I think most of the work will be within the applications. [1] https://review.openstack.org/#/c/91446/ [2] https://blueprints.launchpad.net/oslo.log/+spec/app-agnostic-logging-p arameters [3] https://blueprints.launchpad.net/oslo.log/+spec/remove-context- adapter And from James Blair: 1) Improve log correlation and utility If we're going to improve the stability of OpenStack, we have to be able to understand what's going on when it breaks. That's both true as developers when we're trying to diagnose a failure in an integration test, and it's true for operators who are all too often diagnosing the same failure in a real deployment. Consistency in logging across projects as well as a cross-project request token would go a long way toward this. While I am not currently managing an OpenStack deployment, writing tests or code, or debugging the stack, I have spent many years doing just that. Through QA, Ops and Customer support, I have come to revel in good logging and log messages and curse the holes and vagaries in many systems. Defining/refining logs to be useful and usable is a cross-functional effort that needs to include: · Operators · QA · End Users · Community managers · Tech Pubs · Translators · Developers · TC (which provides the forum and impetus for all the projects to cooperate on this) At the moment, I think this effort may best work under the auspices of Oslo (oslo.log), I'd love to hear other proposals. Here is the beginnings of my proposal of how to attack and subdue the painful state of logs: · Post this email to the MLs (dev, ops, enduser) to get feedback, garner support and participants in the process (Done;-) · In parallel: o Collect up problems, issues, ideas, solutions on an etherpad https://etherpad.openstack.org/p/Log-Rationalization where anyone in the communities can post. o Categorize reported Log issues into classes (already identified classes): § Format Consistency across projects § Log level definition and categorization