Take a look at the logging page in the docs:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperInternals.html#sc_logging
Some good guidelines in there. Basically we log things at info level
that are interesting/informational but not logged so frequently that
they fill the log. WARN is for things that are bad but that we can
handle (like network connectivity failure). ERROR is generally for
things we don't expect and are unlikely we can handle. FATAL means
really bad, we shutdown the server. Many end users log only at WARN
level or higher in production, so typically we err on the side of WARN
for issues (so that we have a shot at debugging after the fact). Over
time, as we gain confidence in production environments, we've been
pushing more things that were WARN down to INFO.
I fixed a number of JIRAs for 3.3 related to logging. In particular I
cleaned up the client session logging significantly. The most fertile
area right now to cleanup logging is in the quorum code. That code in
particular has issues wrt providing sufficient information to debug
error conditions. You can easily see this by starting an ensemble of
greater than 1 machine and try killing one/more of the servers. There
are many places where the logging is insufficient (eg. "got vote", which
doesn't say what the vote was or what the effect of such a vote is,
etc...) Having improved logging in this area would really help.
Try searching on the JIRA
https://issues.apache.org/jira/browse/ZOOKEEPER
of open/closed issues re "log4j" or "logging" or "log" for further insight.
Patrick
Benjamin Reed wrote:
awesome! that would be great ivan. i'm sure pat has some more concrete
suggestions, but one simple thing to do is to run the unit tests and
look at the log messages that get output. there are a couple of
categories of things that need to be fixed (this is in no way exhaustive):
1) messages that have useful information, but only if you look in the
code to figure out what it means. there are some leader election
messages that fall into this category. it would be nice to clarify them.
2) there are error messages that really aren't errors. when shutting
down there are a bunch of errors that are expected, but still logged,
for example.
3) misclassified error levels
welcome aboard!
ben
On 03/29/2010 10:07 AM, Ivan Kelly wrote:
Hi,
Im going to be using Zookeeper quite extensively for a project in a
few weeks, but development hasn't kicked off yet. This means I have
some time on my hands and I'd like to get familiar with zookeeper
beforehand by perhaps writing some tools to make debugging problems
with it easier so as to save myself some time in the future. Problem
is I haven't had to debug many zookeeper problems yet, so I don't know
where the pain points are.
So, without further ado,
- Are there any places that logging is deficient that sorely needs
improvement?
- Could current logs be improved any amount or presented in a more
readable fashion?
- Would some form of log visualisation be useful (for example in
something approximating a sequence diagram)?
Feel free to suggest anything which the list above doesn't allude to
which you think would be helpful.
Cheers,
Ivan