Re: Defining the behaviour of Proton Engine API under error conditions

Rafael Schloming Wed, 27 Mar 2013 13:17:27 -0700

On Wed, Mar 27, 2013 at 11:53 AM, Keith W <keith.w...@gmail.com> wrote:


> Hi all
>
> Phil and I are tasked with producing a comprehensive set of system
> tests for Proton Engine.
>
> The aim is to produce a test suite that will execute against all
> Proton implementations thus guaranteeing that all exhibit identical
> behaviour and assuring conformance with the AMQP 1-0 specification.
>
> This work has highlighted the need to define how Proton Engine API
> behaves under error conditions.
>
> To start a discussion, below we have identified below six different
> types of error and posed a number of questions regarding the behaviour
> of the Engine API. Thoughts?
>
> Regards, Keith.
>
> Background
> ==========
>
> We have identified two sources of test-cases:
>
> A) AMQP specification (parts 1 and 2).  For example, the spec states
> (2.4.1) "the open frame can only be sent on channel 0" suggesting a
> test case should exercise the path where Proton receives an Open on a
> channel number other than 0, and similarly (2.4.2) "[The Close] frame
> MUST be the last thing ever written onto a connection" suggests a test
> case where Proton receives another frame after the receipt of Close.
>
> B) Proton-API itself suggests test-cases. For example, what if a user
> tries to bind a connection to more than one transport, opens too many
> channels, or calls connection close on a connection that has already
> been closed.
>
>
> Error conditions
> ================
>
> Numbers 1-4 relate to the bottom half (i.e. the transport I/O functions):
>
> 1) bytes that do not conform to AMQP 1.0 Part 1 [e.g. inconsistent
> size/doff]
>
> 2) bytes that while constituting a valid frame (conforms to Part
> 2.3.1) are an invalid AMQP frame (violates Part 2.3.2) [e.g.
> frame-body containing a primitive string rather than performative]
>
> 3) bytes that constitute a valid AMQP frame (conforms to Part 2.3.2) but:
>  3A) performative is malformed [e.g. field with unexpected type or
> mandatory with value null]
>  3B) performative with additional fields [e.g. a Close with additional
> fields ]
>  3C) frame that breaks a AMQP business rule [e.g. Open received on
> non-zero channel]
>
> 4) state error [e.g. Begin performative received before Open, Attach
> on unknown channel number etc]
>
> Numbers 5-6 relate to the top half (i.e. the functions relating to
> Connection, Session, etc):
>
> 5) illegal parameters to a method call [e.g.
> pn_connection_set_container with null container name]
> 6) illegal state [e.g. pn_connection_open called twice, pn_session
> called on unopened connection, pn_session called too many times etc]
>

One thing that pops out here is your comment about it being an error to
call pn_session() on an unopened connection. I believe this may indicate a
misunderstanding of a key property of the design.

The top half represents endpoint state. Connections, sessions, links, and
deliveries, are all just data structures. These data structures can be
built up in one of two ways, either directly through the top half API by
calling the various constructors: pn_session(), pn_sender(), pn_receiver(),
pn_delivery(), or they can be built up by binding a transport object to a
connection and then feeding bytes into that transport object. Now for the
bytes that are fed into that transport, there certainly is a constraint
that you can't send a begin frame without already having sent an open
frame, and likewise you can't send an attach frame without already sending
a begin frame, however these constraints are part of the protocol
definition, and have no bearing with how the same endpoint data structures
can be constructed directly through the top half API.

It is perfectly valid to construct a connection, session, and link, create
deliveries on them, supply data for those deliveries, even update and/or
settle those deliveries without ever opening any of the containing
connection/session/links, or indeed without ever binding a transport. It is
the job of the engine to figure out how to translate the current endpoint
state into a valid sequence of protocol primitives. So, for example, if a
transport is bound to connection with a whole lot of open sessions and
links, but the connection itself isn't open yet, the transport will simply
not send any frames because there is nothing it can legally send until the
connection is opened.


>
> Questions
> =========
>

So your questions below are best answered in the context of the new
transport interface, so I'm going to describe that a bit first:

     +-------------+          +-------------+          +-------------+
     |             |  Input   |             |  Tail    |             |
     |             |--------->|             |--------->|             |
     |   Socket    |          |   Driver    |          |  Transport  |
     |             |<---------|             |<---------|             |
     |             |  Output  |             |  Head    |             |
     +-------------+          +-------------+          +-------------+

If you recall, conceptually we have the driver reading data available in
the socket's input buffer into free capacity at the tail of the transport,
and also writing pending bytes at the head of the transport into free space
available in the socket's output buffer.

Now it's important to understand the different possible conditions that can
arise and how each component will/should behave.

On the socket side of things we have the following possibilities:
  - the socket input can go into a state where it will never produce
anymore bytes
    + there are two sub variants of this state, it can happen normally, and
it can happen with some kind of error code returned by the socket API
  - the socket output can go into a state where it will never accept
anymore bytes
    + this can happen due to some kind of error, or it can happen because
the driver explicitly closes the output

On the Transport side of things we have the following possibilities:
  - the tail can go into a state where it will never have free space for
more input
  - the head can go into a state where it will never again produce pending
bytes

Given the above there are a few things we can say about driver behaviour:
  - if the head goes into a state where it will never again produce pending
bytes, the driver should shutdown the socket output
  - if the tail goes into a state where it will never again have capacity
for input, the driver should shutdown the socket input
  - if the socket input goes into a state where it will never produce
anymore bytes for the transport, the driver should inform the transport
that it will never get anymore bytes
  - if the socket output goes into a state where it will never be able to
write anymore bytes produced by the transoprt, the driver should inform the
transport

The above is all captured in the API for the transport. We have close_tail
for the driver to inform the transport that the socket input was closed, we
have capacity which can return an error code indicating that there will
never be any free space, we have pending which can return an error code
indicating that there will never be anymore available bytes, and of course
we have close_head which the driver can use to inform the transport that no
bytes will ever be consumed again:

  int pn_transport_close_tail(pn_transport_t *transport);
  ssize_t pn_transport_capacity(pn_transport_t *transport);

  ssize_t pn_transport_pending(pn_transport_t *transport);
  int pn_transport_close_head(pn_transport_t *transport);


>
> When the bottom half encounters input characterised by 1-4, how does
> the botton-half of the API behave? What is the effect on the top half?
>
> 1. Will the bottom half continue to accept more input?
>

In a way this is kind of unimportant to specify. With the way the new
transport interface works, the driver will read anywhere from 0 up to
"capacity" bytes into the transport's tail. Depending on how network reads
end up being fragmented, this could be end up being a large amount of
garbage data, a little amount of garbage data, or some amount of good data
with garbadge somewhere in the middle. In all cases the transport will end
up going into an error state, possibly writing out some number of detach
and end frames, almost certainly writing out a close frame with some kind
of helpful debugging info in it, and then indicating that there will never
be anymore pending bytes available by returning PN_EOS from
pn_transport_pending.


> 2. Will the botton-half continue to produce output?
>

Yes, see above.


> 3. How will the application using top half API know an error has
> occured? What are the application's responsibilities when it learns of
> an error?
>

The transport has an error variable which can be inspected to get more
information about what has happened. Also, if/when the transport is unbound
from the connection, all of the remote endpoint states will transition back
to UNINIT.

I'm not sure how to answer what the applications responsibilities are. That
seems to depend on the application. It could just decide to shutdown with
an error message or it could decide to employ some sort of retry strategy,
e.g. connect to a backup service, create a new transport, and bind it to
the same top half. I'm not sure I'd say it has any hard and fast
responsibilities per/se though.


> 4. If a connection is already opened, how (if at all) does the
> presense of the error condition affect the connection?
>

Basically at some point the transport can no longer process anymore input
data, so it's kind of the same thing as if the wire were cut and that input
data was simply no longer available to process. Of course the fact that it
was likely a programming error that lead to that circumstance would
probably influence how you might react, e.g. you wouldn't try to reconnect
(to the same implementation at least) and do the same operation again, but
logically the top half should still be available and reflect the endpoint
state as of the last valid bytes that were processed. In fact you could
imagine failover in a situation like this so long as you were failing over
to a different vendor.


>
> When the top half used in a manner characterised by 5-6, how does the
> top half behave?  What, if any, is the affect on the bottom half?
>

So for 5 I would expect there to be no effect on the top half or the bottom
half, i.e. whenever possible we should pre-examine all input parameters and
ensure that we can successfuly complete whatever it is we're being asked to
do before we actually go about changing any internal state. This may of
course not always be practical, but I'd have to consider any exceptions on
a case by case basis.

Likewise with 6, although I don't believe any of the examples given there
actually constitute illegal states for the endpoint data structures, with
the exception of calling pn_session() too many times which I don't believe
is really a state error, it's more of a resource error, akin to calling
malloc too many times or getting an out of memory exception in Java. (To be
clear, the number of sessions held by the top half data structure is not
limited by the max channel negotiated at the wire level, the max channel
only impacts how many sessions can be attached simultaneously, so the only
real limit on the number of calls to pn_session() is how much memory you
have.)

--Rafael

Re: Defining the behaviour of Proton Engine API under error conditions

Reply via email to