Re: [openstack-dev] RFC - Icehouse logging harmonization
On Wed, Oct 23 2013, Sean Dague wrote: Goals - Short Term === As much feedback as possible from both core projects and openstack deployers about the kinds of things that they believe we should be logging, and the kinds of levels they think those things should land at. Determining how crazy it is to try to harmonize this across services. Figure out who else wants to help. Where help means: * helping figure out what's already concensus in services * helping figure out things that are really aberrant from that concensus * helping build concensus with various core teams on a common * helping with contributions to projects that are interested in contributions to move them closer to the concensus Determining if everyone just hates the idea, and I should give up now. :) (That is a valid response to this RFC, feel free to put that out there). Big +1 from here. We want to leverage the usage of log from all projects in Ceilometer by coupling it with notifications in the future. So having them being *very* good quality is going to be a must. -- Julien Danjou // Free Software hacker / independent consultant // http://julien.danjou.info signature.asc Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
Example 1: == n-conductor log in tempest/devstack - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz Total log lines: 84076 Total non DEBUG lines: 61 Question: do we need more than 1 level of DEBUG? 3 orders of magnitude information change between INFO - DEBUG seems too steep a cliff. Some of them are not useful to me (but might be to others), like the amqp channel lines. However, everything else has been pretty crucial at one point or another when debugging issues that span between the two tightly-coupled services. --Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
On Thu, Oct 24, 2013 at 07:05:19AM -0700, Dan Smith wrote: Some of them are not useful to me (but might be to others), like the amqp channel lines. However, everything else has been pretty crucial at one point or another when debugging issues that span between the two tightly-coupled services. I am completely unfamiliar with the code, so I apologize if these are dumb questions: - Is everything making use of Python's logging module? - Would this be a could use-case for that module's support of file-based configuration (http://docs.python.org/2/howto/logging.html#configuring-logging) This would let a cloud deployer have much more granular control over what log messages show up (and even what log messages go where). For example, maybe I don't care about messages from quantum.openstack.common.rpc.impl_qpid, and I generally only want to log WARN and above, but I want to see DEBUG messages for quantum.plugins.openvswitch.agent.ovs_quantum_agent. Or is that Too Much Flexibility? -- Lars Kellogg-Stedman l...@redhat.com pgp4FTOI0Dp6d.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
On Thu, Oct 24, 2013 at 3:24 PM, Lars Kellogg-Stedman l...@redhat.comwrote: On Thu, Oct 24, 2013 at 07:05:19AM -0700, Dan Smith wrote: Some of them are not useful to me (but might be to others), like the amqp channel lines. However, everything else has been pretty crucial at one point or another when debugging issues that span between the two tightly-coupled services. I am completely unfamiliar with the code, so I apologize if these are dumb questions: - Is everything making use of Python's logging module? AFAIK yes, and if not we should be. - Would this be a could use-case for that module's support of file-based configuration (http://docs.python.org/2/howto/logging.html#configuring-logging) Yes, we do that in nova already and should do it in neutron as well, see http://git.openstack.org/cgit/openstack/nova/tree/etc/nova/nova.conf.sample#n1408 http://git.openstack.org/cgit/openstack/nova/tree/etc/nova/logging_sample.conf This would let a cloud deployer have much more granular control over what log messages show up (and even what log messages go where). For example, maybe I don't care about messages from quantum.openstack.common.rpc.impl_qpid, and I generally only want to log WARN and above, but I want to see DEBUG messages for quantum.plugins.openvswitch.agent.ovs_quantum_agent. Or is that Too Much Flexibility? While I think giving the deployer such fine grain control is a good idea, we want to include sane defaults when possible. -- Lars Kellogg-Stedman l...@redhat.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
On 10/24/2013 10:05 AM, Dan Smith wrote: Example 1: == n-conductor log in tempest/devstack - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz Total log lines: 84076 Total non DEBUG lines: 61 Question: do we need more than 1 level of DEBUG? 3 orders of magnitude information change between INFO - DEBUG seems too steep a cliff. Some of them are not useful to me (but might be to others), like the amqp channel lines. However, everything else has been pretty crucial at one point or another when debugging issues that span between the two tightly-coupled services. Right, which is definitely why it's a conversation, to figure out what's useful, and what isn't. We definitely don't want to remove things that are useful. The amqp lines are 49562 of the DEBUG lines, so dropping those would drop our DEBUG output in more than half, which would be cool if it didn't have an impact of folks. I also just wanted to raise the question, are there multiple levels of DEBUG that might make sense here? For instance, every received seems to be followed by an unpacked, which actually has info that was in the received hash - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz#_2013-10-23_12_25_22_524 If we had DEBUG and DEBUG2 levels, where one of them would only be seen at the higher debug level, would that be useful? I'm not actually trying to pick on conductor here, but it makes a good example of a service that DEBUG level is extremely useful to development, and is used heavily, and might make us thing about multi levels of DEBUG to go deeper down the rabbit hole only if we really need to. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
If we had DEBUG and DEBUG2 levels, where one of them would only be seen at the higher debug level, would that be useful? I'm fine with not seeing those for devstack runs, yeah. --Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
On 10/24/2013 11:00 AM, Sean Dague wrote: On 10/24/2013 10:05 AM, Dan Smith wrote: Example 1: == n-conductor log in tempest/devstack - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz Total log lines: 84076 Total non DEBUG lines: 61 Question: do we need more than 1 level of DEBUG? 3 orders of magnitude information change between INFO - DEBUG seems too steep a cliff. Some of them are not useful to me (but might be to others), like the amqp channel lines. However, everything else has been pretty crucial at one point or another when debugging issues that span between the two tightly-coupled services. Right, which is definitely why it's a conversation, to figure out what's useful, and what isn't. We definitely don't want to remove things that are useful. The amqp lines are 49562 of the DEBUG lines, so dropping those would drop our DEBUG output in more than half, which would be cool if it didn't have an impact of folks. I also just wanted to raise the question, are there multiple levels of DEBUG that might make sense here? For instance, every received seems to be followed by an unpacked, which actually has info that was in the received hash - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz#_2013-10-23_12_25_22_524 If we had DEBUG and DEBUG2 levels, where one of them would only be seen at the higher debug level, would that be useful? I'm not actually trying to pick on conductor here, but it makes a good example of a service that DEBUG level is extremely useful to development, and is used heavily, and might make us thing about multi levels of DEBUG to go deeper down the rabbit hole only if we really need to. Note that we can't actually change the level used by other libs, like amqp. However, we can set more granular logger levels. We could re-define debug=True to only set up debug for openstack stuff, and add a new option debug_all=True that enables debugging for *everything*. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
On a previous project, I wrote a library that provided a command-line option to set the logging levels for different loggers. This was handy for developers and for support. An example translated to Keystone would be like keystone-all --logging=keystone.identity=DEBUG Now the keystone.identity loggers are DEBUG while the rest of the loggers are still at INFO or whatever their default is. This would be used if you think the problem is in the identity backend. It's more convenient than editing a config file. Also, in our config file we listed the important loggers and had the default levels for them... for keystone it would be like # keystone.identity=INFO # keystone.assignment=INFO # dogpile=WARNING This was useful for developers and customers alike, because it was then easier to figure out what the loggers are. - Brant On Thu, Oct 24, 2013 at 10:22 AM, Russell Bryant rbry...@redhat.com wrote: On 10/24/2013 11:00 AM, Sean Dague wrote: On 10/24/2013 10:05 AM, Dan Smith wrote: Example 1: == n-conductor log in tempest/devstack - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz Total log lines: 84076 Total non DEBUG lines: 61 Question: do we need more than 1 level of DEBUG? 3 orders of magnitude information change between INFO - DEBUG seems too steep a cliff. Some of them are not useful to me (but might be to others), like the amqp channel lines. However, everything else has been pretty crucial at one point or another when debugging issues that span between the two tightly-coupled services. Right, which is definitely why it's a conversation, to figure out what's useful, and what isn't. We definitely don't want to remove things that are useful. The amqp lines are 49562 of the DEBUG lines, so dropping those would drop our DEBUG output in more than half, which would be cool if it didn't have an impact of folks. I also just wanted to raise the question, are there multiple levels of DEBUG that might make sense here? For instance, every received seems to be followed by an unpacked, which actually has info that was in the received hash - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz#_2013-10-23_12_25_22_524 If we had DEBUG and DEBUG2 levels, where one of them would only be seen at the higher debug level, would that be useful? I'm not actually trying to pick on conductor here, but it makes a good example of a service that DEBUG level is extremely useful to development, and is used heavily, and might make us thing about multi levels of DEBUG to go deeper down the rabbit hole only if we really need to. Note that we can't actually change the level used by other libs, like amqp. However, we can set more granular logger levels. We could re-define debug=True to only set up debug for openstack stuff, and add a new option debug_all=True that enables debugging for *everything*. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
On 10/24/2013 11:46 AM, Brant Knudson wrote: On a previous project, I wrote a library that provided a command-line option to set the logging levels for different loggers. This was handy for developers and for support. An example translated to Keystone would be like keystone-all --logging=keystone.identity=DEBUG Now the keystone.identity loggers are DEBUG while the rest of the loggers are still at INFO or whatever their default is. This would be used if you think the problem is in the identity backend. It's more convenient than editing a config file. Also, in our config file we listed the important loggers and had the default levels for them... for keystone it would be like # keystone.identity=INFO # keystone.assignment=INFO # dogpile=WARNING This was useful for developers and customers alike, because it was then easier to figure out what the loggers are. We support something like that (from oslo-incubator's logging code): # list of logger=LEVEL pairs (list value) #default_log_levels=amqplib=WARN,sqlalchemy=WARN,boto=WARN,suds=INFO,keystone=INFO,eventlet.wsgi.server=WARN -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] RFC - Icehouse logging harmonization
One of the efforts that we're working on from the QA team is tooling that ensures we aren't stack tracing into our test logs during normal tempest runs. Random stack traces are scary to cloud admins consuming OpenStack logs, and exceptions in the logs should really be exceptional events (and indicative of a failing system), not something that we do by default. Our intent is to gate code on clean logs (no stacktraces) eventually (i.e. if you try to land a patch that causes stack traces in OpenStack, that becomes a failing condition), and we've got an incremental white list based approach that should let us make forward progress on that. But on that thread - http://lists.openstack.org/pipermail/openstack-dev/2013-October/017012.html we exposed another issue... across projects, OpenStack is very inconsistent with logging. First... baseline, these are the logging levels that we have in OpenStack today (with numeric values, higher = worse): CRITICAL = 50 FATAL = CRITICAL ERROR = 40 WARNING = 30 WARN = WARNING AUDIT = 21 # invented for oslo-logging INFO = 20 DEBUG = 10 NOTSET = 0 We also have TRACE, which isn't a level per say, it happens at another level. However TRACE is typically an ERROR in the way we use it. Some examples of oddities in the current system (all from a single devstack/tempest run): Example 1: == n-conductor log in tempest/devstack - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz Total log lines: 84076 Total non DEBUG lines: 61 Question: do we need more than 1 level of DEBUG? 3 orders of magnitude information change between INFO - DEBUG seems too steep a cliff. Example 2: == ceilometer-collector - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-ceilometer-collector.txt.gz AUDIT log level being used as DEBUG level (even though it's higher than INFO). 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline meter_pipeline 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline cpu_pipeline 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline meter_pipeline 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline cpu_pipeline (this is every second, for most seconds, for the entire run) Example 3: === cinder-api - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-c-api.txt.gz?level=ERROR ERROR level being used for 404s of volumes Example 4: === glance-api - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-g-api.txt.gz 2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in store configuration: Unexpected error while running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n' 2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to configure store correctly: Store sheepdog could not be configured correctly. Reason: Error in store configuration: Unexpected error while running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n' Disabling add method. part of every single Tempest / Devstack run, even though we aren't trying to configure sheepdog in the gate I think we can, and should do better, and started trying to brain dump into this etherpad - https://etherpad.openstack.org/p/icehouse-logging-harmonization (examples included). This is one of those topics that I think our current 6 track summit model doesn't make easy address, as we really need general concensus across any project that's using oslo-logging, so I believe mailing list is the better option, at least for now. Goals - Short Term === As much feedback as possible from both core projects and openstack deployers about the kinds of things that they believe we should be logging, and the kinds of levels they think those things should land at. Determining how crazy it is to try to harmonize this across services. Figure out who else wants to help. Where help means: * helping figure out what's already concensus in services * helping figure out things that are really aberrant from that concensus * helping build concensus with various core teams on a common * helping with contributions to projects that are interested in contributions to move them closer to the concensus Determining if everyone just hates the idea, and I should give up now. :) (That is a valid response to this RFC, feel free to put that out there). Goals - Long Term === A set of guidelines on logging standards so that OpenStack as a whole feels more whole when it comes to dealing with the log data. These are going to be guidelines, not rules. Some projects are always going to have unique needs. But I suspect a lot of
Re: [openstack-dev] RFC - Icehouse logging harmonization
On Wed, Oct 23, 2013 at 11:20 AM, Sean Dague s...@dague.net wrote: One of the efforts that we're working on from the QA team is tooling that ensures we aren't stack tracing into our test logs during normal tempest runs. Random stack traces are scary to cloud admins consuming OpenStack logs, and exceptions in the logs should really be exceptional events (and indicative of a failing system), not something that we do by default. Our intent is to gate code on clean logs (no stacktraces) eventually (i.e. if you try to land a patch that causes stack traces in OpenStack, that becomes a failing condition), and we've got an incremental white list based approach that should let us make forward progress on that. But on that thread - http://lists.openstack.org/pipermail/openstack-dev/2013-October/017012.html we exposed another issue... across projects, OpenStack is very inconsistent with logging. First... baseline, these are the logging levels that we have in OpenStack today (with numeric values, higher = worse): CRITICAL = 50 FATAL = CRITICAL ERROR = 40 WARNING = 30 WARN = WARNING AUDIT = 21 # invented for oslo-logging INFO = 20 DEBUG = 10 NOTSET = 0 We also have TRACE, which isn't a level per say, it happens at another level. However TRACE is typically an ERROR in the way we use it. Some examples of oddities in the current system (all from a single devstack/tempest run): Example 1: == n-conductor log in tempest/devstack - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz Total log lines: 84076 Total non DEBUG lines: 61 Question: do we need more than 1 level of DEBUG? 3 orders of magnitude information change between INFO - DEBUG seems too steep a cliff. Example 2: == ceilometer-collector - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-ceilometer-collector.txt.gz AUDIT log level being used as DEBUG level (even though it's higher than INFO). 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline meter_pipeline 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline cpu_pipeline 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline meter_pipeline 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline cpu_pipeline (this is every second, for most seconds, for the entire run) Example 3: === cinder-api - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-c-api.txt.gz?level=ERROR ERROR level being used for 404s of volumes Example 4: === glance-api - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-g-api.txt.gz 2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in store configuration: Unexpected error while running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n' 2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to configure store correctly: Store sheepdog could not be configured correctly. Reason: Error in store configuration: Unexpected error while running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n' Disabling add method. part of every single Tempest / Devstack run, even though we aren't trying to configure sheepdog in the gate I think we can, and should do better, and started trying to brain dump into this etherpad - https://etherpad.openstack.org/p/icehouse-logging-harmonization (examples included). This is one of those topics that I think our current 6 track summit model doesn't make easy address, as we really need general concensus across any project that's using oslo-logging, so I believe mailing list is the better option, at least for now. Goals - Short Term === As much feedback as possible from both core projects and openstack deployers about the kinds of things that they believe we should be logging, and the kinds of levels they think those things should land at. Determining how crazy it is to try to harmonize this across services. Figure out who else wants to help. Where help means: * helping figure out what's already concensus in services * helping figure out things that are really aberrant from that concensus * helping build concensus with various core teams on a common * helping with contributions to projects that are interested in contributions to move them closer to the concensus Determining if everyone just hates the idea, and I should give up now. :) (That is a valid response to this RFC, feel free to put that out there). Goals - Long Term === A set of guidelines on logging standards so that OpenStack as a whole feels more whole when it comes to dealing with the log data. These are going to
Re: [openstack-dev] RFC - Icehouse logging harmonization
On Wed, Oct 23, 2013 at 1:20 PM, Sean Dague s...@dague.net wrote: One of the efforts that we're working on from the QA team is tooling that ensures we aren't stack tracing into our test logs during normal tempest runs. Random stack traces are scary to cloud admins consuming OpenStack logs, and exceptions in the logs should really be exceptional events (and indicative of a failing system), not something that we do by default. Our intent is to gate code on clean logs (no stacktraces) eventually (i.e. if you try to land a patch that causes stack traces in OpenStack, that becomes a failing condition), and we've got an incremental white list based approach that should let us make forward progress on that. But on that thread - http://lists.openstack.org/**pipermail/openstack-dev/2013-** October/017012.htmlhttp://lists.openstack.org/pipermail/openstack-dev/2013-October/017012.htmlwe exposed another issue... across projects, OpenStack is very inconsistent with logging. First... baseline, these are the logging levels that we have in OpenStack today (with numeric values, higher = worse): CRITICAL = 50 FATAL = CRITICAL ERROR = 40 WARNING = 30 WARN = WARNING AUDIT = 21 # invented for oslo-logging INFO = 20 DEBUG = 10 NOTSET = 0 We also have TRACE, which isn't a level per say, it happens at another level. However TRACE is typically an ERROR in the way we use it. Some examples of oddities in the current system (all from a single devstack/tempest run): Example 1: == n-conductor log in tempest/devstack - http://logs.openstack.org/70/** 52870/3/check/check-tempest-**devstack-vm-full/f46b756/logs/** screen-n-cond.txt.gzhttp://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz Total log lines: 84076 Total non DEBUG lines: 61 Question: do we need more than 1 level of DEBUG? 3 orders of magnitude information change between INFO - DEBUG seems too steep a cliff. Example 2: == ceilometer-collector - http://logs.openstack.org/70/** 52870/3/check/check-tempest-**devstack-vm-full/f46b756/logs/** screen-ceilometer-collector.**txt.gzhttp://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-ceilometer-collector.txt.gz AUDIT log level being used as DEBUG level (even though it's higher than INFO). 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline meter_pipeline 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline cpu_pipeline 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline meter_pipeline 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline cpu_pipeline (this is every second, for most seconds, for the entire run) Example 3: === cinder-api - http://logs.openstack.org/70/** 52870/3/check/check-tempest-**devstack-vm-full/f46b756/logs/** screen-c-api.txt.gz?level=**ERRORhttp://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-c-api.txt.gz?level=ERROR ERROR level being used for 404s of volumes Example 4: === glance-api - http://logs.openstack.org/70/**52870/3/check/check-tempest-** devstack-vm-full/f46b756/logs/**screen-g-api.txt.gzhttp://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-g-api.txt.gz 2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in store configuration: Unexpected error while running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n' 2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to configure store correctly: Store sheepdog could not be configured correctly. Reason: Error in store configuration: Unexpected error while running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n' Disabling add method. part of every single Tempest / Devstack run, even though we aren't trying to configure sheepdog in the gate I think we can, and should do better, and started trying to brain dump into this etherpad - https://etherpad.openstack.**org/p/icehouse-logging-* *harmonizationhttps://etherpad.openstack.org/p/icehouse-logging-harmonization(examples included). This is one of those topics that I think our current 6 track summit model doesn't make easy address, as we really need general concensus across any project that's using oslo-logging, so I believe mailing list is the better option, at least for now. Goals - Short Term === As much feedback as possible from both core projects and openstack deployers about the kinds of things that they believe we should be logging, and the kinds of levels they think those things should land at. Deprecation warnings! Based on the approach we're taking in the patch below, we'll be able to notate how imminently a feature is facing
Re: [openstack-dev] RFC - Icehouse logging harmonization
On 24 October 2013 08:14, Dolph Mathews dolph.math...@gmail.com wrote: On Wed, Oct 23, 2013 at 1:20 PM, Sean Dague s...@dague.net wrote: Deprecation warnings! Based on the approach we're taking in the patch below, we'll be able to notate how imminently a feature is facing deprecation. Right now, they're just landing in WARNING, but I think we'll surely a desire to silence, prioritize or redirect those messages using different log levels (for example, based on how imminently a feature is facing deprecation). https://review.openstack.org/#/c/50486/ Huh, I did not see that go by. Python already has built in signalling for deprecated features; I think we should be using that. We can of course wrap it with a little sugar to make it easy to encode future deprecations. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
On Wed, Oct 23, 2013 at 1:03 PM, Clark Boylan clark.boy...@gmail.comwrote: On Wed, Oct 23, 2013 at 11:20 AM, Sean Dague s...@dague.net wrote: One of the efforts that we're working on from the QA team is tooling that ensures we aren't stack tracing into our test logs during normal tempest runs. Random stack traces are scary to cloud admins consuming OpenStack logs, and exceptions in the logs should really be exceptional events (and indicative of a failing system), not something that we do by default. Our intent is to gate code on clean logs (no stacktraces) eventually (i.e. if you try to land a patch that causes stack traces in OpenStack, that becomes a failing condition), and we've got an incremental white list based approach that should let us make forward progress on that. But on that thread - http://lists.openstack.org/pipermail/openstack-dev/2013-October/017012.html we exposed another issue... across projects, OpenStack is very inconsistent with logging. First... baseline, these are the logging levels that we have in OpenStack today (with numeric values, higher = worse): CRITICAL = 50 FATAL = CRITICAL ERROR = 40 WARNING = 30 WARN = WARNING AUDIT = 21 # invented for oslo-logging INFO = 20 DEBUG = 10 NOTSET = 0 We also have TRACE, which isn't a level per say, it happens at another level. However TRACE is typically an ERROR in the way we use it. Some examples of oddities in the current system (all from a single devstack/tempest run): Example 1: == n-conductor log in tempest/devstack - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz Total log lines: 84076 Total non DEBUG lines: 61 Question: do we need more than 1 level of DEBUG? 3 orders of magnitude information change between INFO - DEBUG seems too steep a cliff. Example 2: == ceilometer-collector - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-ceilometer-collector.txt.gz AUDIT log level being used as DEBUG level (even though it's higher than INFO). 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline meter_pipeline 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline cpu_pipeline 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline meter_pipeline 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline cpu_pipeline (this is every second, for most seconds, for the entire run) Example 3: === cinder-api - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-c-api.txt.gz?level=ERROR ERROR level being used for 404s of volumes Example 4: === glance-api - http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-g-api.txt.gz 2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in store configuration: Unexpected error while running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n' 2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to configure store correctly: Store sheepdog could not be configured correctly. Reason: Error in store configuration: Unexpected error while running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n' Disabling add method. part of every single Tempest / Devstack run, even though we aren't trying to configure sheepdog in the gate I think we can, and should do better, and started trying to brain dump into this etherpad - https://etherpad.openstack.org/p/icehouse-logging-harmonization(examples included). This is one of those topics that I think our current 6 track summit model doesn't make easy address, as we really need general concensus across any project that's using oslo-logging, so I believe mailing list is the better option, at least for now. Goals - Short Term === As much feedback as possible from both core projects and openstack deployers about the kinds of things that they believe we should be logging, and the kinds of levels they think those things should land at. Determining how crazy it is to try to harmonize this across services. Figure out who else wants to help. Where help means: * helping figure out what's already concensus in services * helping figure out things that are really aberrant from that concensus * helping build concensus with various core teams on a common * helping with contributions to projects that are interested in contributions to move them closer to the concensus Determining if everyone just hates the idea, and I should give up now. :) (That is a valid response to this RFC, feel free to
Re: [openstack-dev] RFC - Icehouse logging harmonization
On 24 October 2013 08:28, John Griffith john.griff...@solidfire.com wrote: So I touched on this a bit in my earlier post but want to reiterate here and maybe clarify a bit. I agree that cleaning up and standardizing the logs is a good thing, and particularly removing unhandled exception messages would be good. What concerns me however is the approach being taken here of saying things like Error level messages are banned from Tempest runs. The case I mentioned earlier of the negative test is a perfect example. There's no way for Cinder (or any other service) to know the difference between the end user specifying/requesting a non-existent volume and a valid volume being requested that for some reason can't be found. I'm not quite sure how you place a definitive rule like no error messages in logs unless you make your tests such that you never run negative tests? Let me check that I understand: you want to check that when a user asks for a volume that doesn't exist, they don't get it, *and* that the reason they didn't get it was due to Cinder detecting it's missing, not due to e.g. cinder throwing an error and returning 500 ? If so, that seems pretty straight forward; a) check the error that is reported (it should be a 404 and contain an explanation which we can check) and b) check the logs to see that nothing was logged (because a server fault would be logged). There are other cases in cinder as well that I'm concerned about. One example is iscsi target creation, there are a number of scenarios where this can fail under certain conditions. In most of these cases we now have retry mechanisms or alternate implementations to complete the task. The fact is however that a call somewhere in the system failed, this should be something in my opinion that stands out in the logs. Maybe this particular case would be well suited to being a warning other than an error, and that's fine. My point however though is that I think some thought needs to go into this before making blanketing rules and especially gating criteria that says no error messages in logs. I agree thought and care is needed. As a deployer my concern is that the only time ERROR is logged in the logs is when something is wrong with the infrastructure (rather than a user asking for something stupid). I think my concern and yours can both be handled at the same time. -Rob --- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
On Wed, Oct 23, 2013 at 3:26 PM, Robert Collins robe...@robertcollins.netwrote: On 24 October 2013 08:14, Dolph Mathews dolph.math...@gmail.com wrote: On Wed, Oct 23, 2013 at 1:20 PM, Sean Dague s...@dague.net wrote: Deprecation warnings! Based on the approach we're taking in the patch below, we'll be able to notate how imminently a feature is facing deprecation. Right now, they're just landing in WARNING, but I think we'll surely a desire to silence, prioritize or redirect those messages using different log levels (for example, based on how imminently a feature is facing deprecation). https://review.openstack.org/#/c/50486/ Huh, I did not see that go by. Python already has built in signalling for deprecated features; I think we should be using that. We can of course wrap it with a little sugar to make it easy to encode future deprecations. The initial patch used the the warnings modules instead of using the deprecated logging provided by oslo. We talked about it IRC and I switched because there existed a way to do deprecation messaging is olso and it had a configurable way to turn warnings into exceptions. I intend to submit a patch to oslo-incubator based off of the above patch. I'd love to get some feedback about the needs of the other projects. -- David blog: http://www.traceback.org twitter: http://twitter.com/dstanek www: http://dstanek.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC - Icehouse logging harmonization
On 10/23/2013 03:35 PM, Robert Collins wrote: On 24 October 2013 08:28, John Griffith john.griff...@solidfire.com wrote: So I touched on this a bit in my earlier post but want to reiterate here and maybe clarify a bit. I agree that cleaning up and standardizing the logs is a good thing, and particularly removing unhandled exception messages would be good. What concerns me however is the approach being taken here of saying things like Error level messages are banned from Tempest runs. The case I mentioned earlier of the negative test is a perfect example. There's no way for Cinder (or any other service) to know the difference between the end user specifying/requesting a non-existent volume and a valid volume being requested that for some reason can't be found. I'm not quite sure how you place a definitive rule like no error messages in logs unless you make your tests such that you never run negative tests? Let me check that I understand: you want to check that when a user asks for a volume that doesn't exist, they don't get it, *and* that the reason they didn't get it was due to Cinder detecting it's missing, not due to e.g. cinder throwing an error and returning 500 ? If so, that seems pretty straight forward; a) check the error that is reported (it should be a 404 and contain an explanation which we can check) and b) check the logs to see that nothing was logged (because a server fault would be logged). There are other cases in cinder as well that I'm concerned about. One example is iscsi target creation, there are a number of scenarios where this can fail under certain conditions. In most of these cases we now have retry mechanisms or alternate implementations to complete the task. The fact is however that a call somewhere in the system failed, this should be something in my opinion that stands out in the logs. Maybe this particular case would be well suited to being a warning other than an error, and that's fine. My point however though is that I think some thought needs to go into this before making blanketing rules and especially gating criteria that says no error messages in logs. Absolutely agreed. That's why I wanted to kick off this discussion and am thinking about how we get to agreement by Icehouse (giving this lots of time to bake and getting different perspectives in here). On the short term of failing jobs in tempest because they've got errors in the logs, we've got a whole white list mechanism right now for acceptable errors. Over time I'd love to shrink that to 0. But that's going to be a collaboration between the QA team and the specific core projects to make sure that's the right call in each case. Who knows, maybe there are generally agreed to ERROR conditions that we trigger, but we'll figure that out overtime. I think the iscsi example is a good case for WARNING, which is the same level we use when we fail to schedule a resource (compute / volume). Especially because we try to recover now. If we fail to recover, ERROR is probably called for. But if we actually failed to alocate a volume, we'd end up failing the tests anyways, which means the ERROR in the log wouldn't be a problem in and of itself. I agree thought and care is needed. As a deployer my concern is that the only time ERROR is logged in the logs is when something is wrong with the infrastructure (rather than a user asking for something stupid). I think my concern and yours can both be handled at the same time. Right, and I think this is the perspective that I'm coming from. Our logs (at INFO and up) are UX to our cloud admins. We should be pretty sure that we know something is a problem if we tag it as an ERROR, or CRITICAL. Because that's likely to be something that negatively impacts someones day. If we aren't completely sure your cloud is on fire, but we're pretty sure something is odd, WARNING is appropriate. If it's no good, but we have no way to test if it's a problem, it's just INFO. I really think the not found case falls more into standard INFO. Again, more concrete instances like the iscsi case, are probably the most helpful. I think in the abstract this problem is too hard to solve, but with examples, we can probably come to some concensus. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev