On Wed, Mar 27, 2013 at 11:53 AM, Keith W <keith.w...@gmail.com> wrote:
> Hi all > > Phil and I are tasked with producing a comprehensive set of system > tests for Proton Engine. > > The aim is to produce a test suite that will execute against all > Proton implementations thus guaranteeing that all exhibit identical > behaviour and assuring conformance with the AMQP 1-0 specification. > > This work has highlighted the need to define how Proton Engine API > behaves under error conditions. > > To start a discussion, below we have identified below six different > types of error and posed a number of questions regarding the behaviour > of the Engine API. Thoughts? > > Regards, Keith. > > Background > ========== > > We have identified two sources of test-cases: > > A) AMQP specification (parts 1 and 2). For example, the spec states > (2.4.1) "the open frame can only be sent on channel 0" suggesting a > test case should exercise the path where Proton receives an Open on a > channel number other than 0, and similarly (2.4.2) "[The Close] frame > MUST be the last thing ever written onto a connection" suggests a test > case where Proton receives another frame after the receipt of Close. > > B) Proton-API itself suggests test-cases. For example, what if a user > tries to bind a connection to more than one transport, opens too many > channels, or calls connection close on a connection that has already > been closed. > > > Error conditions > ================ > > Numbers 1-4 relate to the bottom half (i.e. the transport I/O functions): > > 1) bytes that do not conform to AMQP 1.0 Part 1 [e.g. inconsistent > size/doff] > > 2) bytes that while constituting a valid frame (conforms to Part > 2.3.1) are an invalid AMQP frame (violates Part 2.3.2) [e.g. > frame-body containing a primitive string rather than performative] > > 3) bytes that constitute a valid AMQP frame (conforms to Part 2.3.2) but: > 3A) performative is malformed [e.g. field with unexpected type or > mandatory with value null] > 3B) performative with additional fields [e.g. a Close with additional > fields ] > 3C) frame that breaks a AMQP business rule [e.g. Open received on > non-zero channel] > > 4) state error [e.g. Begin performative received before Open, Attach > on unknown channel number etc] > > Numbers 5-6 relate to the top half (i.e. the functions relating to > Connection, Session, etc): > > 5) illegal parameters to a method call [e.g. > pn_connection_set_container with null container name] > 6) illegal state [e.g. pn_connection_open called twice, pn_session > called on unopened connection, pn_session called too many times etc] > One thing that pops out here is your comment about it being an error to call pn_session() on an unopened connection. I believe this may indicate a misunderstanding of a key property of the design. The top half represents endpoint state. Connections, sessions, links, and deliveries, are all just data structures. These data structures can be built up in one of two ways, either directly through the top half API by calling the various constructors: pn_session(), pn_sender(), pn_receiver(), pn_delivery(), or they can be built up by binding a transport object to a connection and then feeding bytes into that transport object. Now for the bytes that are fed into that transport, there certainly is a constraint that you can't send a begin frame without already having sent an open frame, and likewise you can't send an attach frame without already sending a begin frame, however these constraints are part of the protocol definition, and have no bearing with how the same endpoint data structures can be constructed directly through the top half API. It is perfectly valid to construct a connection, session, and link, create deliveries on them, supply data for those deliveries, even update and/or settle those deliveries without ever opening any of the containing connection/session/links, or indeed without ever binding a transport. It is the job of the engine to figure out how to translate the current endpoint state into a valid sequence of protocol primitives. So, for example, if a transport is bound to connection with a whole lot of open sessions and links, but the connection itself isn't open yet, the transport will simply not send any frames because there is nothing it can legally send until the connection is opened. > > Questions > ========= > So your questions below are best answered in the context of the new transport interface, so I'm going to describe that a bit first: +-------------+ +-------------+ +-------------+ | | Input | | Tail | | | |--------->| |--------->| | | Socket | | Driver | | Transport | | |<---------| |<---------| | | | Output | | Head | | +-------------+ +-------------+ +-------------+ If you recall, conceptually we have the driver reading data available in the socket's input buffer into free capacity at the tail of the transport, and also writing pending bytes at the head of the transport into free space available in the socket's output buffer. Now it's important to understand the different possible conditions that can arise and how each component will/should behave. On the socket side of things we have the following possibilities: - the socket input can go into a state where it will never produce anymore bytes + there are two sub variants of this state, it can happen normally, and it can happen with some kind of error code returned by the socket API - the socket output can go into a state where it will never accept anymore bytes + this can happen due to some kind of error, or it can happen because the driver explicitly closes the output On the Transport side of things we have the following possibilities: - the tail can go into a state where it will never have free space for more input - the head can go into a state where it will never again produce pending bytes Given the above there are a few things we can say about driver behaviour: - if the head goes into a state where it will never again produce pending bytes, the driver should shutdown the socket output - if the tail goes into a state where it will never again have capacity for input, the driver should shutdown the socket input - if the socket input goes into a state where it will never produce anymore bytes for the transport, the driver should inform the transport that it will never get anymore bytes - if the socket output goes into a state where it will never be able to write anymore bytes produced by the transoprt, the driver should inform the transport The above is all captured in the API for the transport. We have close_tail for the driver to inform the transport that the socket input was closed, we have capacity which can return an error code indicating that there will never be any free space, we have pending which can return an error code indicating that there will never be anymore available bytes, and of course we have close_head which the driver can use to inform the transport that no bytes will ever be consumed again: int pn_transport_close_tail(pn_transport_t *transport); ssize_t pn_transport_capacity(pn_transport_t *transport); ssize_t pn_transport_pending(pn_transport_t *transport); int pn_transport_close_head(pn_transport_t *transport); > > When the bottom half encounters input characterised by 1-4, how does > the botton-half of the API behave? What is the effect on the top half? > > 1. Will the bottom half continue to accept more input? > In a way this is kind of unimportant to specify. With the way the new transport interface works, the driver will read anywhere from 0 up to "capacity" bytes into the transport's tail. Depending on how network reads end up being fragmented, this could be end up being a large amount of garbage data, a little amount of garbage data, or some amount of good data with garbadge somewhere in the middle. In all cases the transport will end up going into an error state, possibly writing out some number of detach and end frames, almost certainly writing out a close frame with some kind of helpful debugging info in it, and then indicating that there will never be anymore pending bytes available by returning PN_EOS from pn_transport_pending. > 2. Will the botton-half continue to produce output? > Yes, see above. > 3. How will the application using top half API know an error has > occured? What are the application's responsibilities when it learns of > an error? > The transport has an error variable which can be inspected to get more information about what has happened. Also, if/when the transport is unbound from the connection, all of the remote endpoint states will transition back to UNINIT. I'm not sure how to answer what the applications responsibilities are. That seems to depend on the application. It could just decide to shutdown with an error message or it could decide to employ some sort of retry strategy, e.g. connect to a backup service, create a new transport, and bind it to the same top half. I'm not sure I'd say it has any hard and fast responsibilities per/se though. > 4. If a connection is already opened, how (if at all) does the > presense of the error condition affect the connection? > Basically at some point the transport can no longer process anymore input data, so it's kind of the same thing as if the wire were cut and that input data was simply no longer available to process. Of course the fact that it was likely a programming error that lead to that circumstance would probably influence how you might react, e.g. you wouldn't try to reconnect (to the same implementation at least) and do the same operation again, but logically the top half should still be available and reflect the endpoint state as of the last valid bytes that were processed. In fact you could imagine failover in a situation like this so long as you were failing over to a different vendor. > > When the top half used in a manner characterised by 5-6, how does the > top half behave? What, if any, is the affect on the bottom half? > So for 5 I would expect there to be no effect on the top half or the bottom half, i.e. whenever possible we should pre-examine all input parameters and ensure that we can successfuly complete whatever it is we're being asked to do before we actually go about changing any internal state. This may of course not always be practical, but I'd have to consider any exceptions on a case by case basis. Likewise with 6, although I don't believe any of the examples given there actually constitute illegal states for the endpoint data structures, with the exception of calling pn_session() too many times which I don't believe is really a state error, it's more of a resource error, akin to calling malloc too many times or getting an out of memory exception in Java. (To be clear, the number of sessions held by the top half data structure is not limited by the max channel negotiated at the wire level, the max channel only impacts how many sessions can be attached simultaneously, so the only real limit on the number of calls to pn_session() is how much memory you have.) --Rafael