Arnaud, I have spoken to Rob about what we think a good exception handling and logging solution will look like. I've written this up in note form, which I am now sharing with the dev list. I'd quite happily just get on with this refactoring effort, however, I notice that you have already done some refactoring on trunk, introducing some new exception classes instead of abusing AMQException in every moment of uncertainty, which seems correct to me. We need to align our thinking on this, so as not to tread on each others toes.
We know there are some things about the existing exception handling that are not right. In particular when the broker encounters unrecoverable conditions, it will often go limping on in an unknown state, when really it should simply terminate the JVM. There is also some dodgy logging code, spread throughout the codebase, that causes exceptions to be logged mutliple times, in constructors, on creation, in catch and rethrow scenarios and so on; I've mailed about this before. I had to write a document explaining to sys admins what to do when they see various log messages, so as a side to this I have compiled a list of todo's to eliminate the dodgy logging. Logging will be done where exceptions are handled and never where they are created (at least logging above the debug level, which is for devs only and can occur anywhere we like). Some thoughts: I think that InternalErrorException should be dropped? It seems likely that it will be abused, as AMQException has been, as a general don't-know-what-to-do-with-it exception class; it is too broad to be a usefull checked exception. If the catcher can't get a more specific exception, it is unlikely to be able to be recovered from, so would be better throwing a runtime anyway. InternlErrorException might make a good root for the exception hierarchy of checked exceptions internal to the broker only? for the case where a handler wants to catch all checked internal exceptions. However, I can't think where we might actually want to do that? For checked exceptions that are handled, the handling code is likely to be not too far away from the throw point, and quite specific to the exception type. I can sort of see where you are going with InternalErrorException, as I agree a seperate hierarchy to AMQException is needed, but there is now a comical situation, where AMQException is caught and rethrown as InternalError, which is then caught and rethrown as AMQException... (which is then caught and logged and swallowed and the broker carries on). It would have been better just to replace the original source (a JMX error in the case I am looking at) of the unrecoverable condition with a throw runtime, then we don't need to worry about all this juggling to fit method signatures. If that runtime comes up in testing, then a suitable handling/recovery strategy for the JMX error would be written, if it never comes up, it might always be left as a runtime. I think AMQException needs to be reserved just for protocol specific exceptions, with an AMQP status code. It should be made abstract, and its constructor should accept a non-null type safe enumeration of an AMQP status code. We should follow the Java convention of using runtime exceptions for conditions that are not expected to be recoverable from, checked exceptions for recoverable conditions. Recoverable here also means that a compensating action can be performed, for example, an internal checked exception may not be recovered from in the sense that the original processing pathway can be re-attained, the compensating action may be to send an error code to the client. Some heuristics for deciding what exception to throw in what scenario might be: In *MethodHandler code, dealing directly with the protocol, raise error conditions as specific AMQExceptions with the appropriate status code. In code further down the call tree, not directly dealing with the protocol, if the error condition can be handled, throw a checked exception for it. For error conditions that you don't know what to do about and cannot obviously be handled, throw a runtime (or might like to use a more specific sub-type of Runtime, for example, IllegalArgumentException). So, I think its correct that you have made message store throw a checked QueueDoesntExistException, if a method handler looks up a non-existant queue, it might translate that into a 404. Other code that calls that method, for example during start-up and configuration, will deal with that exception in its own way, and not as a 404, since it doesn't go through the protocol. One thing I'm keen on is making exceptions only have one constructor, and passing in optional arguments as nulls. Its not a huge deal, its just that I like to Javadoc stuff and its a pain to write three lots of Javadoc and contructors for such trivial things as exceptions. I never understand why it is that exceptions in particular seem to end up with huge numbers of different constructors for every permutation of options. AMQException has 4 and thats after I got rid of another 3! On that note, I think a bit of javadoc on all exception classes explaining the conditions under which the exception may be thrown, is extremely usefull. Here are my notes: -------- Put in top level handlers to correctly handle all exceptions. One already exists at an approriate point in some mina related code, need one at the top of the worker thread pool too. There will also be a top-level handler point somewhere in the Main class to ensure all start up errors are logged (might already be one). Top level handlers to: Shutdown JVM on all unhandled exceptions and error conditions. Be responsible for logging all unhandled exceptions, checked and runtime. An attempt to log errors may be made, but of course, may fail due to the uncertain state of the JVM at that time. If appropriate to the positioning of the top-level handler, handle AMQ status code related exceptions by sending the status code to client, and taking action as appropriate in the protocol, that is, possibly closing the channel or connection. The exception hierarchy needs some adjustment. AMQException originally intended only for AMQP errors with related status codes that get returned to the client, but has been abused as a general-don't-know-what-to-do-with-it exception class. Re-establish it as the root of an exception hierarchy just for AMQP errors. Add internal, non-AMQP exceptions to a different hierarchy. Many unhandled conditions currently thrown as AMQException may need to be changed to throw runtimes, as they are not possible to recover from. Make all exceptions have a single constructor, that accepts nulls for optional parameters. It makes writing sub-classes less tedious. Create two new logging hierarchies. One for 'business' or operate level logging. One for AMQP wire level logging. Classes that output debug logging for developers as well as wire or operate logging will have multiple loggers. AMQP wire level logging could use the AMQP class.method hierarchy to log all wire level stuff to. It should log to the trace level as it will only ever be used for fairly low level debugging work. Seperate hierarchy makes it easy to isolate and watch the protocol traffic on the wire. The operate level logging may have some sort of hierarchy? Or just a root, that is at least different to 'org.apache.qpid', for example 'qpid.operate'. That will make it easy to write a production log4j file that just pulls out the logging that operators would want to know about. Operate level logging to use well known identifiers for all errors, to be included in the log output. Something like 'QPID-XX', with XX an error code. All error codes to be defined as a type-safe enum, with code, format string and number of arguments defined. All operator exceptions to get their codes from this one file, making it easy to add new codes without accidentally re-using already assigned numbers. (Could generate this from xml/properties/other so that code definitions are shared between Java and C++, but will only do that if/when C++ wants to use same logging codes). Makes writing the operator logging documentation easier too as no longer need to grep the entire codebase for logging statements. There will probably only be 3 calls to log.error in the entire codebase; one at each of the top-level handler locations identified. Eliminate arbitrary logging (at info, warn or error level) of exceptions in constructors, or upon creation, or in catch and rethrow scenarios. All exceptions to be logged when handled only, that is, at the top level handler, or in handlers that have specifically been written to recover/compensate for checked exceptions. Work on trunk, if there's time merge into M2, otherwise these improvements are for M3. -------- Rupert
