Java Broker Exception Handling and Logging.

Rupert Smith Thu, 17 May 2007 02:44:38 -0700

Arnaud,

I have spoken to Rob about what we think a good exception handling and
logging solution will look like. I've written this up in note form, which I
am now sharing with the dev list. I'd quite happily just get on with this
refactoring effort, however, I notice that you have already done some
refactoring on trunk, introducing some new exception classes instead of
abusing AMQException in every moment of uncertainty, which seems correct to
me. We need to align our thinking on this, so as not to tread on each others
toes.


We know there are some things about the existing exception handling that are
not right. In particular when the broker encounters unrecoverable
conditions, it will often go limping on in an unknown state, when really it
should simply terminate the JVM.

There is also some dodgy logging code, spread throughout the codebase, that
causes exceptions to be logged mutliple times, in constructors, on creation,
in catch and rethrow scenarios and so on; I've mailed about this before. I
had to write a document explaining to sys admins what to do when they see
various log messages, so as a side to this I have compiled a list of todo's
to eliminate the dodgy logging. Logging will be done where exceptions are
handled and never where they are created (at least logging above the debug
level, which is for devs only and can occur anywhere we like).

Some thoughts:

I think that InternalErrorException should be dropped? It seems likely that
it will be abused, as AMQException has been, as a general
don't-know-what-to-do-with-it exception class; it is too broad to be a
usefull checked exception. If the catcher can't get a more specific
exception, it is unlikely to be able to be recovered from, so would be
better throwing a runtime anyway. InternlErrorException might make a good
root for the exception hierarchy of checked exceptions internal to the
broker only? for the case where a handler wants to catch all checked
internal exceptions. However, I can't think where we might actually want to
do that? For checked exceptions that are handled, the handling code is
likely to be not too far away from the throw point, and quite specific to
the exception type.

I can sort of see where you are going with InternalErrorException, as I
agree a seperate hierarchy to AMQException is needed, but there is now a
comical situation, where AMQException is caught and rethrown as
InternalError, which is then caught and rethrown as AMQException... (which
is then caught and logged and swallowed and the broker carries on).

It would have been better just to replace the original source (a JMX error
in the case I am looking at) of the unrecoverable condition with a throw
runtime, then we don't need to worry about all this juggling to fit method
signatures. If that runtime comes up in testing, then a suitable
handling/recovery strategy for the JMX error would be written, if it never
comes up, it might always be left as a runtime.

I think AMQException needs to be reserved just for protocol specific
exceptions, with an AMQP status code. It should be made abstract, and its
constructor should accept a non-null type safe enumeration of an AMQP status
code.

We should follow the Java convention of using runtime exceptions for
conditions that are not expected to be recoverable from, checked exceptions
for recoverable conditions. Recoverable here also means that a compensating
action can be performed, for example, an internal checked exception may not
be recovered from in the sense that the original processing pathway can be
re-attained, the compensating action may be to send an error code to the
client.

Some heuristics for deciding what exception to throw in what scenario might
be:

In *MethodHandler code, dealing directly with the protocol, raise error
conditions as specific AMQExceptions with the appropriate status code.
In code further down the call tree, not directly dealing with the protocol,
if the error condition can be handled, throw a checked exception for it.
For error conditions that you don't know what to do about and cannot
obviously be handled, throw a runtime (or might like to use a more specific
sub-type of Runtime, for example, IllegalArgumentException).

So, I think its correct that you have made message store throw a checked
QueueDoesntExistException, if a method handler looks up a non-existant
queue, it might translate that into a 404. Other code that calls that
method, for example during start-up and configuration, will deal with that
exception in its own way, and not as a 404, since it doesn't go through the
protocol.

One thing I'm keen on is making exceptions only have one constructor, and
passing in optional arguments as nulls. Its not a huge deal, its just that I
like to Javadoc stuff and its a pain to write three lots of Javadoc and
contructors for such trivial things as exceptions. I never understand why it
is that exceptions in particular seem to end up with huge numbers of
different constructors for every permutation of options. AMQException has 4
and thats after I got rid of another 3! On that note, I think a bit of
javadoc on all exception classes explaining the conditions under which the
exception may be thrown, is extremely usefull.

Here are my notes:

--------

Put in top level handlers to correctly handle all exceptions. One already
exists at an approriate point in some mina related code, need one at the top
of the worker thread pool too. There will also be a top-level handler point
somewhere in the Main class to ensure all start up errors are logged (might
already be one).

Top level handlers to:

Shutdown JVM on all unhandled exceptions and error conditions.
Be responsible for logging all unhandled exceptions, checked and runtime. An
attempt to log errors may be made, but of course, may fail due to the
uncertain state of the JVM at that time.
If appropriate to the positioning of the top-level handler, handle AMQ
status code related exceptions by sending the status code to client, and
taking action as appropriate in the protocol, that is, possibly closing the
channel or connection.

The exception hierarchy needs some adjustment. AMQException originally
intended only for AMQP errors with related status codes that get returned to
the client, but has been abused as a general-don't-know-what-to-do-with-it
exception class. Re-establish it as the root of an exception hierarchy just
for AMQP errors. Add internal, non-AMQP exceptions to a different hierarchy.


Many unhandled conditions currently thrown as AMQException may need to be
changed to throw runtimes, as they are not possible to recover from.

Make all exceptions have a single constructor, that accepts nulls for
optional parameters. It makes writing sub-classes less tedious.

Create two new logging hierarchies. One for 'business' or operate level
logging. One for AMQP wire level logging. Classes that output debug logging
for developers as well as wire or operate logging will have multiple
loggers.

AMQP wire level logging could use the AMQP class.method hierarchy to log all
wire level stuff to. It should log to the trace level as it will only ever
be used for fairly low level debugging work. Seperate hierarchy makes it
easy to isolate and watch the protocol traffic on the wire.

The operate level logging may have some sort of hierarchy? Or just a root,
that is at least different to 'org.apache.qpid', for example 'qpid.operate'.
That will make it easy to write a production log4j file that just pulls out
the logging that operators would want to know about.

Operate level logging to use well known identifiers for all errors, to be
included in the log output. Something like 'QPID-XX', with XX an error code.
All error codes to be defined as a type-safe enum, with code, format string
and number of arguments defined. All operator exceptions to get their codes
from this one file, making it easy to add new codes without accidentally
re-using already assigned numbers. (Could generate this from
xml/properties/other so that code definitions are shared between Java and
C++, but will only do that if/when C++ wants to use same logging codes).
Makes writing the operator logging documentation easier too as no longer
need to grep the entire codebase for logging statements. There will probably
only be 3 calls to log.error in the entire codebase; one at each of the
top-level handler locations identified.

Eliminate arbitrary logging (at info, warn or error level) of exceptions in
constructors, or upon creation, or in catch and rethrow scenarios. All
exceptions to be logged when handled only, that is, at the top level
handler, or in handlers that have specifically been written to
recover/compensate for checked exceptions.

Work on trunk, if there's time merge into M2, otherwise these improvements
are for M3.

--------

Rupert

Java Broker Exception Handling and Logging.

Reply via email to