Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-27 Thread Julien Danjou
On Wed, Oct 23 2013, Sean Dague wrote:

 Goals - Short Term
 ===
 As much feedback as possible from both core projects and openstack deployers
 about the kinds of things that they believe we should be logging, and the
 kinds of levels they think those things should land at.

 Determining how crazy it is to try to harmonize this across services.

 Figure out who else wants to help. Where help means:
  * helping figure out what's already concensus in services
  * helping figure out things that are really aberrant from that concensus
  * helping build concensus with various core teams on a common
  * helping with contributions to projects that are interested in
 contributions to move them closer to the concensus

 Determining if everyone just hates the idea, and I should give up now. :)
 (That is a valid response to this RFC, feel free to put that out there).

Big +1 from here. We want to leverage the usage of log from all projects
in Ceilometer by coupling it with notifications in the future. So having
them being *very* good quality is going to be a must.

-- 
Julien Danjou
// Free Software hacker / independent consultant
// http://julien.danjou.info


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-24 Thread Dan Smith
 Example 1:
 ==
 
 n-conductor log in tempest/devstack -
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz
 
 
 Total log lines: 84076
 Total non DEBUG lines: 61
 
 Question: do we need more than 1 level of DEBUG? 3 orders of
 magnitude information change between INFO - DEBUG seems too steep a cliff.

Some of them are not useful to me (but might be to others), like the
amqp channel lines. However, everything else has been pretty crucial at
one point or another when debugging issues that span between the two
tightly-coupled services.

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-24 Thread Lars Kellogg-Stedman
On Thu, Oct 24, 2013 at 07:05:19AM -0700, Dan Smith wrote:
 Some of them are not useful to me (but might be to others), like the
 amqp channel lines. However, everything else has been pretty crucial at
 one point or another when debugging issues that span between the two
 tightly-coupled services.

I am completely unfamiliar with the code, so I apologize if these are
dumb questions:

- Is everything making use of Python's logging module?
- Would this be a could use-case for that module's support of
  file-based configuration
  (http://docs.python.org/2/howto/logging.html#configuring-logging)

This would let a cloud deployer have much more granular control
over what log messages show up (and even what log messages go where).
For example, maybe I don't care about messages from
quantum.openstack.common.rpc.impl_qpid, and I generally only want to
log WARN and above, but I want to see DEBUG messages for
quantum.plugins.openvswitch.agent.ovs_quantum_agent.

Or is that Too Much Flexibility?

-- 
Lars Kellogg-Stedman l...@redhat.com



pgp4FTOI0Dp6d.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-24 Thread Joe Gordon
On Thu, Oct 24, 2013 at 3:24 PM, Lars Kellogg-Stedman l...@redhat.comwrote:

 On Thu, Oct 24, 2013 at 07:05:19AM -0700, Dan Smith wrote:
  Some of them are not useful to me (but might be to others), like the
  amqp channel lines. However, everything else has been pretty crucial at
  one point or another when debugging issues that span between the two
  tightly-coupled services.

 I am completely unfamiliar with the code, so I apologize if these are
 dumb questions:

 - Is everything making use of Python's logging module?


AFAIK yes, and if not we should be.


 - Would this be a could use-case for that module's support of
   file-based configuration
   (http://docs.python.org/2/howto/logging.html#configuring-logging)

 Yes, we do that in nova already and should do it in neutron as well, see
http://git.openstack.org/cgit/openstack/nova/tree/etc/nova/nova.conf.sample#n1408
http://git.openstack.org/cgit/openstack/nova/tree/etc/nova/logging_sample.conf


 This would let a cloud deployer have much more granular control
 over what log messages show up (and even what log messages go where).
 For example, maybe I don't care about messages from
 quantum.openstack.common.rpc.impl_qpid, and I generally only want to
 log WARN and above, but I want to see DEBUG messages for
 quantum.plugins.openvswitch.agent.ovs_quantum_agent.

 Or is that Too Much Flexibility?


While I think giving the deployer such fine grain control is a good idea,
we want to include sane defaults when possible.



 --
 Lars Kellogg-Stedman l...@redhat.com


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-24 Thread Sean Dague

On 10/24/2013 10:05 AM, Dan Smith wrote:

Example 1:
==

 n-conductor log in tempest/devstack -
http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz


 Total log lines: 84076
 Total non DEBUG lines: 61

 Question: do we need more than 1 level of DEBUG? 3 orders of
magnitude information change between INFO - DEBUG seems too steep a cliff.


Some of them are not useful to me (but might be to others), like the
amqp channel lines. However, everything else has been pretty crucial at
one point or another when debugging issues that span between the two
tightly-coupled services.


Right, which is definitely why it's a conversation, to figure out what's 
useful, and what isn't. We definitely don't want to remove things that 
are useful.


The amqp lines are 49562 of the DEBUG lines, so dropping those would 
drop our DEBUG output in more than half, which would be cool if it 
didn't have an impact of folks.


I also just wanted to raise the question, are there multiple levels of 
DEBUG that might make sense here? For instance, every received seems to 
be followed by an unpacked, which actually has info that was in the 
received hash - 
http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz#_2013-10-23_12_25_22_524


If we had DEBUG and DEBUG2 levels, where one of them would only be seen 
at the higher debug level, would that be useful?


I'm not actually trying to pick on conductor here, but it makes a good 
example of a service that DEBUG level is extremely useful to 
development, and is used heavily, and might make us thing about multi 
levels of DEBUG to go deeper down the rabbit hole only if we really need to.


-Sean

--
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-24 Thread Dan Smith
 If we had DEBUG and DEBUG2 levels, where one of them would only be seen
 at the higher debug level, would that be useful?

I'm fine with not seeing those for devstack runs, yeah.

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-24 Thread Russell Bryant
On 10/24/2013 11:00 AM, Sean Dague wrote:
 On 10/24/2013 10:05 AM, Dan Smith wrote:
 Example 1:
 ==

  n-conductor log in tempest/devstack -
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz



  Total log lines: 84076
  Total non DEBUG lines: 61

  Question: do we need more than 1 level of DEBUG? 3 orders of
 magnitude information change between INFO - DEBUG seems too steep a
 cliff.

 Some of them are not useful to me (but might be to others), like the
 amqp channel lines. However, everything else has been pretty crucial at
 one point or another when debugging issues that span between the two
 tightly-coupled services.
 
 Right, which is definitely why it's a conversation, to figure out what's
 useful, and what isn't. We definitely don't want to remove things that
 are useful.
 
 The amqp lines are 49562 of the DEBUG lines, so dropping those would
 drop our DEBUG output in more than half, which would be cool if it
 didn't have an impact of folks.
 
 I also just wanted to raise the question, are there multiple levels of
 DEBUG that might make sense here? For instance, every received seems to
 be followed by an unpacked, which actually has info that was in the
 received hash -
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz#_2013-10-23_12_25_22_524
 
 
 If we had DEBUG and DEBUG2 levels, where one of them would only be seen
 at the higher debug level, would that be useful?
 
 I'm not actually trying to pick on conductor here, but it makes a good
 example of a service that DEBUG level is extremely useful to
 development, and is used heavily, and might make us thing about multi
 levels of DEBUG to go deeper down the rabbit hole only if we really need
 to.

Note that we can't actually change the level used by other libs, like
amqp.  However, we can set more granular logger levels.  We could
re-define debug=True to only set up debug for openstack stuff, and add a
new option debug_all=True that enables debugging for *everything*.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-24 Thread Brant Knudson
On a previous project, I wrote a library that provided a command-line
option to set the logging levels for different loggers. This was handy for
developers and for support. An example translated to Keystone would be like

keystone-all --logging=keystone.identity=DEBUG

Now the keystone.identity loggers are DEBUG while the rest of the loggers
are still at INFO or whatever their default is. This would be used if you
think the problem is in the identity backend. It's more convenient than
editing a config file.

Also, in our config file we listed the important loggers and had the
default levels for them... for keystone it would be like

# keystone.identity=INFO
# keystone.assignment=INFO
# dogpile=WARNING

This was useful for developers and customers alike, because it was then
easier to figure out what the loggers are.

- Brant




On Thu, Oct 24, 2013 at 10:22 AM, Russell Bryant rbry...@redhat.com wrote:

 On 10/24/2013 11:00 AM, Sean Dague wrote:
  On 10/24/2013 10:05 AM, Dan Smith wrote:
  Example 1:
  ==
 
   n-conductor log in tempest/devstack -
 
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz
 
 
 
   Total log lines: 84076
   Total non DEBUG lines: 61
 
   Question: do we need more than 1 level of DEBUG? 3 orders of
  magnitude information change between INFO - DEBUG seems too steep a
  cliff.
 
  Some of them are not useful to me (but might be to others), like the
  amqp channel lines. However, everything else has been pretty crucial at
  one point or another when debugging issues that span between the two
  tightly-coupled services.
 
  Right, which is definitely why it's a conversation, to figure out what's
  useful, and what isn't. We definitely don't want to remove things that
  are useful.
 
  The amqp lines are 49562 of the DEBUG lines, so dropping those would
  drop our DEBUG output in more than half, which would be cool if it
  didn't have an impact of folks.
 
  I also just wanted to raise the question, are there multiple levels of
  DEBUG that might make sense here? For instance, every received seems to
  be followed by an unpacked, which actually has info that was in the
  received hash -
 
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz#_2013-10-23_12_25_22_524
 
 
  If we had DEBUG and DEBUG2 levels, where one of them would only be seen
  at the higher debug level, would that be useful?
 
  I'm not actually trying to pick on conductor here, but it makes a good
  example of a service that DEBUG level is extremely useful to
  development, and is used heavily, and might make us thing about multi
  levels of DEBUG to go deeper down the rabbit hole only if we really need
  to.

 Note that we can't actually change the level used by other libs, like
 amqp.  However, we can set more granular logger levels.  We could
 re-define debug=True to only set up debug for openstack stuff, and add a
 new option debug_all=True that enables debugging for *everything*.

 --
 Russell Bryant

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-24 Thread Russell Bryant
On 10/24/2013 11:46 AM, Brant Knudson wrote:
 
 On a previous project, I wrote a library that provided a command-line
 option to set the logging levels for different loggers. This was handy
 for developers and for support. An example translated to Keystone would
 be like
 
 keystone-all --logging=keystone.identity=DEBUG
 
 Now the keystone.identity loggers are DEBUG while the rest of the
 loggers are still at INFO or whatever their default is. This would be
 used if you think the problem is in the identity backend. It's more
 convenient than editing a config file.
 
 Also, in our config file we listed the important loggers and had the
 default levels for them... for keystone it would be like
 
 # keystone.identity=INFO
 # keystone.assignment=INFO
 # dogpile=WARNING
 
 This was useful for developers and customers alike, because it was then
 easier to figure out what the loggers are.

We support something like that (from oslo-incubator's logging code):

# list of logger=LEVEL pairs (list value)
#default_log_levels=amqplib=WARN,sqlalchemy=WARN,boto=WARN,suds=INFO,keystone=INFO,eventlet.wsgi.server=WARN

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] RFC - Icehouse logging harmonization

2013-10-23 Thread Sean Dague
One of the efforts that we're working on from the QA team is tooling 
that ensures we aren't stack tracing into our test logs during normal 
tempest runs. Random stack traces are scary to cloud admins consuming 
OpenStack logs, and exceptions in the logs should really be exceptional 
events (and indicative of a failing system), not something that we do by 
default. Our intent is to gate code on clean logs (no stacktraces) 
eventually (i.e. if you try to land a patch that causes stack traces in 
OpenStack, that becomes a failing condition), and we've got an 
incremental white list based approach that should let us make forward 
progress on that. But on that thread - 
http://lists.openstack.org/pipermail/openstack-dev/2013-October/017012.html 
we exposed another issue... across projects, OpenStack is very 
inconsistent with logging.


First... baseline, these are the logging levels that we have in 
OpenStack today (with numeric values, higher = worse):


CRITICAL = 50
FATAL = CRITICAL
ERROR = 40
WARNING = 30
WARN = WARNING
AUDIT = 21  # invented for oslo-logging
INFO = 20
DEBUG = 10
NOTSET = 0

We also have TRACE, which isn't a level per say, it happens at another 
level. However TRACE is typically an ERROR in the way we use it.



Some examples of oddities in the current system (all from a single 
devstack/tempest run):


Example 1:
==

n-conductor log in tempest/devstack - 
http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz


Total log lines: 84076
Total non DEBUG lines: 61

Question: do we need more than 1 level of DEBUG? 3 orders of 
magnitude information change between INFO - DEBUG seems too steep a cliff.


Example 2:
==

ceilometer-collector - 
http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-ceilometer-collector.txt.gz


AUDIT log level being used as DEBUG level (even though it's higher 
than INFO).


2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush 
pipeline meter_pipeline
2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush 
pipeline cpu_pipeline
2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush 
pipeline meter_pipeline
2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush 
pipeline cpu_pipeline


(this is every second, for most seconds, for the entire run)

Example 3:
===

cinder-api - 
http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-c-api.txt.gz?level=ERROR

ERROR level being used for 404s of volumes

Example 4:
===
glance-api - 
http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-g-api.txt.gz


2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in 
store configuration: Unexpected error while running command.Command: 
collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n'
2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to 
configure store correctly: Store sheepdog could not be configured 
correctly. Reason: Error in store configuration: Unexpected error while 
running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 
1: collie: not found\n' Disabling add method.


part of every single Tempest / Devstack run, even though we aren't 
trying to configure sheepdog in the gate



I think we can, and should do better, and started trying to brain dump 
into this etherpad - 
https://etherpad.openstack.org/p/icehouse-logging-harmonization 
(examples included).


This is one of those topics that I think our current 6 track summit 
model doesn't make easy address, as we really need general concensus 
across any project that's using oslo-logging, so I believe mailing list 
is the better option, at least for now.



Goals - Short Term
===
As much feedback as possible from both core projects and openstack 
deployers about the kinds of things that they believe we should be 
logging, and the kinds of levels they think those things should land at.


Determining how crazy it is to try to harmonize this across services.

Figure out who else wants to help. Where help means:
 * helping figure out what's already concensus in services
 * helping figure out things that are really aberrant from that concensus
 * helping build concensus with various core teams on a common
 * helping with contributions to projects that are interested in 
contributions to move them closer to the concensus


Determining if everyone just hates the idea, and I should give up now. 
:) (That is a valid response to this RFC, feel free to put that out there).



Goals - Long Term
===
A set of guidelines on logging standards so that OpenStack as a whole 
feels more whole when it comes to dealing with the log data.


These are going to be guidelines, not rules. Some projects are always 
going to have unique needs. But I suspect a lot of 

Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-23 Thread Clark Boylan
On Wed, Oct 23, 2013 at 11:20 AM, Sean Dague s...@dague.net wrote:
 One of the efforts that we're working on from the QA team is tooling that
 ensures we aren't stack tracing into our test logs during normal tempest
 runs. Random stack traces are scary to cloud admins consuming OpenStack
 logs, and exceptions in the logs should really be exceptional events (and
 indicative of a failing system), not something that we do by default. Our
 intent is to gate code on clean logs (no stacktraces) eventually (i.e. if
 you try to land a patch that causes stack traces in OpenStack, that becomes
 a failing condition), and we've got an incremental white list based approach
 that should let us make forward progress on that. But on that thread -
 http://lists.openstack.org/pipermail/openstack-dev/2013-October/017012.html
 we exposed another issue... across projects, OpenStack is very inconsistent
 with logging.

 First... baseline, these are the logging levels that we have in OpenStack
 today (with numeric values, higher = worse):

 CRITICAL = 50
 FATAL = CRITICAL
 ERROR = 40
 WARNING = 30
 WARN = WARNING
 AUDIT = 21  # invented for oslo-logging
 INFO = 20
 DEBUG = 10
 NOTSET = 0

 We also have TRACE, which isn't a level per say, it happens at another
 level. However TRACE is typically an ERROR in the way we use it.


 Some examples of oddities in the current system (all from a single
 devstack/tempest run):

 Example 1:
 ==

 n-conductor log in tempest/devstack -
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz

 Total log lines: 84076
 Total non DEBUG lines: 61

 Question: do we need more than 1 level of DEBUG? 3 orders of magnitude
 information change between INFO - DEBUG seems too steep a cliff.

 Example 2:
 ==

 ceilometer-collector -
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-ceilometer-collector.txt.gz

 AUDIT log level being used as DEBUG level (even though it's higher than
 INFO).

 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
 meter_pipeline
 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
 cpu_pipeline
 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
 meter_pipeline
 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
 cpu_pipeline

 (this is every second, for most seconds, for the entire run)

 Example 3:
 ===

 cinder-api -
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-c-api.txt.gz?level=ERROR
 ERROR level being used for 404s of volumes

 Example 4:
 ===
 glance-api -
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-g-api.txt.gz

 2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in store
 configuration: Unexpected error while running command.Command: collieExit
 code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n'
 2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to
 configure store correctly: Store sheepdog could not be configured correctly.
 Reason: Error in store configuration: Unexpected error while running
 command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie:
 not found\n' Disabling add method.

 part of every single Tempest / Devstack run, even though we aren't trying to
 configure sheepdog in the gate


 I think we can, and should do better, and started trying to brain dump into
 this etherpad -
 https://etherpad.openstack.org/p/icehouse-logging-harmonization (examples
 included).

 This is one of those topics that I think our current 6 track summit model
 doesn't make easy address, as we really need general concensus across any
 project that's using oslo-logging, so I believe mailing list is the better
 option, at least for now.


 Goals - Short Term
 ===
 As much feedback as possible from both core projects and openstack deployers
 about the kinds of things that they believe we should be logging, and the
 kinds of levels they think those things should land at.

 Determining how crazy it is to try to harmonize this across services.

 Figure out who else wants to help. Where help means:
  * helping figure out what's already concensus in services
  * helping figure out things that are really aberrant from that concensus
  * helping build concensus with various core teams on a common
  * helping with contributions to projects that are interested in
 contributions to move them closer to the concensus

 Determining if everyone just hates the idea, and I should give up now. :)
 (That is a valid response to this RFC, feel free to put that out there).


 Goals - Long Term
 ===
 A set of guidelines on logging standards so that OpenStack as a whole feels
 more whole when it comes to dealing with the log data.

 These are going to 

Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-23 Thread Dolph Mathews
On Wed, Oct 23, 2013 at 1:20 PM, Sean Dague s...@dague.net wrote:

 One of the efforts that we're working on from the QA team is tooling that
 ensures we aren't stack tracing into our test logs during normal tempest
 runs. Random stack traces are scary to cloud admins consuming OpenStack
 logs, and exceptions in the logs should really be exceptional events (and
 indicative of a failing system), not something that we do by default. Our
 intent is to gate code on clean logs (no stacktraces) eventually (i.e. if
 you try to land a patch that causes stack traces in OpenStack, that becomes
 a failing condition), and we've got an incremental white list based
 approach that should let us make forward progress on that. But on that
 thread - http://lists.openstack.org/**pipermail/openstack-dev/2013-**
 October/017012.htmlhttp://lists.openstack.org/pipermail/openstack-dev/2013-October/017012.htmlwe
  exposed another issue... across projects, OpenStack is very inconsistent
 with logging.

 First... baseline, these are the logging levels that we have in OpenStack
 today (with numeric values, higher = worse):

 CRITICAL = 50
 FATAL = CRITICAL
 ERROR = 40
 WARNING = 30
 WARN = WARNING
 AUDIT = 21  # invented for oslo-logging
 INFO = 20
 DEBUG = 10
 NOTSET = 0

 We also have TRACE, which isn't a level per say, it happens at another
 level. However TRACE is typically an ERROR in the way we use it.


 Some examples of oddities in the current system (all from a single
 devstack/tempest run):

 Example 1:
 ==

 n-conductor log in tempest/devstack - http://logs.openstack.org/70/**
 52870/3/check/check-tempest-**devstack-vm-full/f46b756/logs/**
 screen-n-cond.txt.gzhttp://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz

 Total log lines: 84076
 Total non DEBUG lines: 61

 Question: do we need more than 1 level of DEBUG? 3 orders of magnitude
 information change between INFO - DEBUG seems too steep a cliff.

 Example 2:
 ==

 ceilometer-collector - http://logs.openstack.org/70/**
 52870/3/check/check-tempest-**devstack-vm-full/f46b756/logs/**
 screen-ceilometer-collector.**txt.gzhttp://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-ceilometer-collector.txt.gz

 AUDIT log level being used as DEBUG level (even though it's higher
 than INFO).

 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
 meter_pipeline
 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
 cpu_pipeline
 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
 meter_pipeline
 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
 cpu_pipeline

 (this is every second, for most seconds, for the entire run)

 Example 3:
 ===

 cinder-api - http://logs.openstack.org/70/**
 52870/3/check/check-tempest-**devstack-vm-full/f46b756/logs/**
 screen-c-api.txt.gz?level=**ERRORhttp://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-c-api.txt.gz?level=ERROR
 ERROR level being used for 404s of volumes

 Example 4:
 ===
 glance-api - http://logs.openstack.org/70/**52870/3/check/check-tempest-**
 devstack-vm-full/f46b756/logs/**screen-g-api.txt.gzhttp://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-g-api.txt.gz

 2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in
 store configuration: Unexpected error while running command.Command:
 collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n'
 2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to
 configure store correctly: Store sheepdog could not be configured
 correctly. Reason: Error in store configuration: Unexpected error while
 running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1:
 collie: not found\n' Disabling add method.

 part of every single Tempest / Devstack run, even though we aren't trying
 to configure sheepdog in the gate


 I think we can, and should do better, and started trying to brain dump
 into this etherpad - https://etherpad.openstack.**org/p/icehouse-logging-*
 *harmonizationhttps://etherpad.openstack.org/p/icehouse-logging-harmonization(examples
  included).

 This is one of those topics that I think our current 6 track summit model
 doesn't make easy address, as we really need general concensus across any
 project that's using oslo-logging, so I believe mailing list is the better
 option, at least for now.


 Goals - Short Term
 ===
 As much feedback as possible from both core projects and openstack
 deployers about the kinds of things that they believe we should be logging,
 and the kinds of levels they think those things should land at.


Deprecation warnings!

Based on the approach we're taking in the patch below, we'll be able to
notate how imminently a feature is facing 

Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-23 Thread Robert Collins
On 24 October 2013 08:14, Dolph Mathews dolph.math...@gmail.com wrote:

 On Wed, Oct 23, 2013 at 1:20 PM, Sean Dague s...@dague.net wrote:


 Deprecation warnings!

 Based on the approach we're taking in the patch below, we'll be able to
 notate how imminently a feature is facing deprecation. Right now, they're
 just landing in WARNING, but I think we'll surely a desire to silence,
 prioritize or redirect those messages using different log levels (for
 example, based on how imminently a feature is facing deprecation).

 https://review.openstack.org/#/c/50486/

Huh, I did not see that go by. Python already has built in signalling
for deprecated features; I think we should be using that. We can of
course wrap it with a little sugar to make it easy to encode future
deprecations.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-23 Thread John Griffith
On Wed, Oct 23, 2013 at 1:03 PM, Clark Boylan clark.boy...@gmail.comwrote:

 On Wed, Oct 23, 2013 at 11:20 AM, Sean Dague s...@dague.net wrote:
  One of the efforts that we're working on from the QA team is tooling that
  ensures we aren't stack tracing into our test logs during normal tempest
  runs. Random stack traces are scary to cloud admins consuming OpenStack
  logs, and exceptions in the logs should really be exceptional events (and
  indicative of a failing system), not something that we do by default. Our
  intent is to gate code on clean logs (no stacktraces) eventually (i.e. if
  you try to land a patch that causes stack traces in OpenStack, that
 becomes
  a failing condition), and we've got an incremental white list based
 approach
  that should let us make forward progress on that. But on that thread -
 
 http://lists.openstack.org/pipermail/openstack-dev/2013-October/017012.html
  we exposed another issue... across projects, OpenStack is very
 inconsistent
  with logging.
 
  First... baseline, these are the logging levels that we have in OpenStack
  today (with numeric values, higher = worse):
 
  CRITICAL = 50
  FATAL = CRITICAL
  ERROR = 40
  WARNING = 30
  WARN = WARNING
  AUDIT = 21  # invented for oslo-logging
  INFO = 20
  DEBUG = 10
  NOTSET = 0
 
  We also have TRACE, which isn't a level per say, it happens at another
  level. However TRACE is typically an ERROR in the way we use it.
 
 
  Some examples of oddities in the current system (all from a single
  devstack/tempest run):
 
  Example 1:
  ==
 
  n-conductor log in tempest/devstack -
 
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz
 
  Total log lines: 84076
  Total non DEBUG lines: 61
 
  Question: do we need more than 1 level of DEBUG? 3 orders of
 magnitude
  information change between INFO - DEBUG seems too steep a cliff.
 
  Example 2:
  ==
 
  ceilometer-collector -
 
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-ceilometer-collector.txt.gz
 
  AUDIT log level being used as DEBUG level (even though it's higher
 than
  INFO).
 
  2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush
 pipeline
  meter_pipeline
  2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush
 pipeline
  cpu_pipeline
  2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush
 pipeline
  meter_pipeline
  2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush
 pipeline
  cpu_pipeline
 
  (this is every second, for most seconds, for the entire run)
 
  Example 3:
  ===
 
  cinder-api -
 
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-c-api.txt.gz?level=ERROR
  ERROR level being used for 404s of volumes
 
  Example 4:
  ===
  glance-api -
 
 http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-g-api.txt.gz
 
  2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in
 store
  configuration: Unexpected error while running command.Command: collieExit
  code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n'
  2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to
  configure store correctly: Store sheepdog could not be configured
 correctly.
  Reason: Error in store configuration: Unexpected error while running
  command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1:
 collie:
  not found\n' Disabling add method.
 
  part of every single Tempest / Devstack run, even though we aren't
 trying to
  configure sheepdog in the gate
 
 
  I think we can, and should do better, and started trying to brain dump
 into
  this etherpad -
  https://etherpad.openstack.org/p/icehouse-logging-harmonization(examples
  included).
 
  This is one of those topics that I think our current 6 track summit model
  doesn't make easy address, as we really need general concensus across any
  project that's using oslo-logging, so I believe mailing list is the
 better
  option, at least for now.
 
 
  Goals - Short Term
  ===
  As much feedback as possible from both core projects and openstack
 deployers
  about the kinds of things that they believe we should be logging, and the
  kinds of levels they think those things should land at.
 
  Determining how crazy it is to try to harmonize this across services.
 
  Figure out who else wants to help. Where help means:
   * helping figure out what's already concensus in services
   * helping figure out things that are really aberrant from that concensus
   * helping build concensus with various core teams on a common
   * helping with contributions to projects that are interested in
  contributions to move them closer to the concensus
 
  Determining if everyone just hates the idea, and I should give up now. :)
  (That is a valid response to this RFC, feel free to 

Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-23 Thread Robert Collins
On 24 October 2013 08:28, John Griffith john.griff...@solidfire.com wrote:
 So I touched on this a bit in my earlier post but want to reiterate here and
 maybe clarify a bit.  I agree that cleaning up and standardizing the logs is
 a good thing, and particularly removing unhandled exception messages would
 be good.  What concerns me however is the approach being taken here of
 saying things like Error level messages are banned from Tempest runs.

 The case I mentioned earlier of the negative test is a perfect example.
 There's no way for Cinder (or any other service) to know the difference
 between the end user specifying/requesting a non-existent volume and a valid
 volume being requested that for some reason can't be found.  I'm not quite
 sure how you place a definitive rule like no error messages in logs unless
 you make your tests such that you never run negative tests?

Let me check that I understand: you want to check that when a user
asks for a volume that doesn't exist, they don't get it, *and* that
the reason they didn't get it was due to Cinder detecting it's
missing, not due to e.g. cinder throwing an error and returning 500 ?

If so, that seems pretty straight forward; a) check the error that is
reported (it should be a 404 and contain an explanation which we can
check) and b) check the logs to see that nothing was logged (because a
server fault would be logged).

 There are other cases in cinder as well that I'm concerned about.  One
 example is iscsi target creation, there are a number of scenarios where this
 can fail under certain conditions.  In most of these cases we now have retry
 mechanisms or alternate implementations to complete the task.  The fact is
 however that a call somewhere in the system failed, this should be something
 in my opinion that stands out in the logs.  Maybe this particular case would
 be well suited to being a warning other than an error, and that's fine.  My
 point however though is that I think some thought needs to go into this
 before making blanketing rules and especially gating criteria that says no
 error messages in logs.

I agree thought and care is needed. As a deployer my concern is that
the only time ERROR is logged in the logs is when something is wrong
with the infrastructure (rather than a user asking for something
stupid). I think my concern and yours can both be handled at the same
time.


-Rob


---
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-23 Thread David Stanek
On Wed, Oct 23, 2013 at 3:26 PM, Robert Collins
robe...@robertcollins.netwrote:

 On 24 October 2013 08:14, Dolph Mathews dolph.math...@gmail.com wrote:
 
  On Wed, Oct 23, 2013 at 1:20 PM, Sean Dague s...@dague.net wrote:

 
  Deprecation warnings!
 
  Based on the approach we're taking in the patch below, we'll be able to
  notate how imminently a feature is facing deprecation. Right now, they're
  just landing in WARNING, but I think we'll surely a desire to silence,
  prioritize or redirect those messages using different log levels (for
  example, based on how imminently a feature is facing deprecation).
 
  https://review.openstack.org/#/c/50486/

 Huh, I did not see that go by. Python already has built in signalling
 for deprecated features; I think we should be using that. We can of
 course wrap it with a little sugar to make it easy to encode future
 deprecations.


The initial patch used the the warnings modules instead of using the
deprecated logging provided by oslo.  We talked about it IRC and I switched
because there existed a way to do deprecation messaging is olso and it had
a configurable way to turn warnings into exceptions.

I intend to submit a patch to oslo-incubator based off of the above patch.
 I'd love to get some feedback about the needs of the other projects.


-- 
David
blog: http://www.traceback.org
twitter: http://twitter.com/dstanek
www: http://dstanek.com
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] RFC - Icehouse logging harmonization

2013-10-23 Thread Sean Dague

On 10/23/2013 03:35 PM, Robert Collins wrote:

On 24 October 2013 08:28, John Griffith john.griff...@solidfire.com wrote:

So I touched on this a bit in my earlier post but want to reiterate here and
maybe clarify a bit.  I agree that cleaning up and standardizing the logs is
a good thing, and particularly removing unhandled exception messages would
be good.  What concerns me however is the approach being taken here of
saying things like Error level messages are banned from Tempest runs.

The case I mentioned earlier of the negative test is a perfect example.
There's no way for Cinder (or any other service) to know the difference
between the end user specifying/requesting a non-existent volume and a valid
volume being requested that for some reason can't be found.  I'm not quite
sure how you place a definitive rule like no error messages in logs unless
you make your tests such that you never run negative tests?


Let me check that I understand: you want to check that when a user
asks for a volume that doesn't exist, they don't get it, *and* that
the reason they didn't get it was due to Cinder detecting it's
missing, not due to e.g. cinder throwing an error and returning 500 ?

If so, that seems pretty straight forward; a) check the error that is
reported (it should be a 404 and contain an explanation which we can
check) and b) check the logs to see that nothing was logged (because a
server fault would be logged).


There are other cases in cinder as well that I'm concerned about.  One
example is iscsi target creation, there are a number of scenarios where this
can fail under certain conditions.  In most of these cases we now have retry
mechanisms or alternate implementations to complete the task.  The fact is
however that a call somewhere in the system failed, this should be something
in my opinion that stands out in the logs.  Maybe this particular case would
be well suited to being a warning other than an error, and that's fine.  My
point however though is that I think some thought needs to go into this
before making blanketing rules and especially gating criteria that says no
error messages in logs.


Absolutely agreed. That's why I wanted to kick off this discussion and 
am thinking about how we get to agreement by Icehouse (giving this lots 
of time to bake and getting different perspectives in here).


On the short term of failing jobs in tempest because they've got errors 
in the logs, we've got a whole white list mechanism right now for 
acceptable errors. Over time I'd love to shrink that to 0. But that's 
going to be a collaboration between the QA team and the specific core 
projects to make sure that's the right call in each case. Who knows, 
maybe there are generally agreed to ERROR conditions that we trigger, 
but we'll figure that out overtime.


I think the iscsi example is a good case for WARNING, which is the same 
level we use when we fail to schedule a resource (compute / volume). 
Especially because we try to recover now. If we fail to recover, ERROR 
is probably called for. But if we actually failed to alocate a volume, 
we'd end up failing the tests anyways, which means the ERROR in the log 
wouldn't be a problem in and of itself.



I agree thought and care is needed. As a deployer my concern is that
the only time ERROR is logged in the logs is when something is wrong
with the infrastructure (rather than a user asking for something
stupid). I think my concern and yours can both be handled at the same
time.


Right, and I think this is the perspective that I'm coming from. Our 
logs (at INFO and up) are UX to our cloud admins.


We should be pretty sure that we know something is a problem if we tag 
it as an ERROR, or CRITICAL. Because that's likely to be something that 
negatively impacts someones day.


If we aren't completely sure your cloud is on fire, but we're pretty 
sure something is odd, WARNING is appropriate.


If it's no good, but we have no way to test if it's a problem, it's just 
INFO. I really think the not found case falls more into standard INFO.


Again, more concrete instances like the iscsi case, are probably the 
most helpful. I think in the abstract this problem is too hard to solve, 
but with examples, we can probably come to some concensus.


-Sean

--
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev