Take a look at the logging page in the docs:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperInternals.html#sc_logging

Some good guidelines in there. Basically we log things at info level that are interesting/informational but not logged so frequently that they fill the log. WARN is for things that are bad but that we can handle (like network connectivity failure). ERROR is generally for things we don't expect and are unlikely we can handle. FATAL means really bad, we shutdown the server. Many end users log only at WARN level or higher in production, so typically we err on the side of WARN for issues (so that we have a shot at debugging after the fact). Over time, as we gain confidence in production environments, we've been pushing more things that were WARN down to INFO.

I fixed a number of JIRAs for 3.3 related to logging. In particular I cleaned up the client session logging significantly. The most fertile area right now to cleanup logging is in the quorum code. That code in particular has issues wrt providing sufficient information to debug error conditions. You can easily see this by starting an ensemble of greater than 1 machine and try killing one/more of the servers. There are many places where the logging is insufficient (eg. "got vote", which doesn't say what the vote was or what the effect of such a vote is, etc...) Having improved logging in this area would really help.

Try searching on the JIRA
https://issues.apache.org/jira/browse/ZOOKEEPER
of open/closed issues re "log4j" or "logging" or "log" for further insight.

Patrick

Benjamin Reed wrote:
awesome! that would be great ivan. i'm sure pat has some more concrete suggestions, but one simple thing to do is to run the unit tests and look at the log messages that get output. there are a couple of categories of things that need to be fixed (this is in no way exhaustive):

1) messages that have useful information, but only if you look in the code to figure out what it means. there are some leader election messages that fall into this category. it would be nice to clarify them. 2) there are error messages that really aren't errors. when shutting down there are a bunch of errors that are expected, but still logged, for example.
3) misclassified error levels

welcome aboard!

ben

On 03/29/2010 10:07 AM, Ivan Kelly wrote:
Hi,

Im going to be using Zookeeper quite extensively for a project in a
few weeks, but development hasn't kicked off yet. This means I have
some time on my hands and I'd like to get familiar with zookeeper
beforehand by perhaps writing some tools to make debugging problems
with it easier so as to save myself some time in the future. Problem
is I haven't had to debug many zookeeper problems yet, so I don't know
where the pain points are.

So, without further ado,
    - Are there any places that logging is deficient that sorely needs
improvement?
    - Could current logs be improved any amount or presented in a more
readable fashion?
    - Would some form of log visualisation be useful (for example in
something approximating a sequence diagram)?

Feel free to suggest anything which the list above doesn't allude to
which you think would be helpful.

Cheers,
Ivan

Reply via email to