Re: Defining the behaviour of Proton Engine API under error conditions

2013-04-02 Thread Phil Harvey
Thanks for the response.  It does clarify the Engine's semantics, and the
intended division of responsibility between the Engine and the
application.  I intend to document this soon in a short conceptual summary
under proton/docs/engine/.

I chatted to Keith about this and we're uncertain about some of the details
of the steps that follow an invalid frame being pushed into the Engine. To
illustrate this, we wrote the following pseudo-code for the main loop of a
typical application (eg a messaging client), similar to Driver's
pn_connector_process function.

1   tail_buf = pn_transport_tail()
2   tail_capacity = pn_transport_capacity()
3   read = socket_recv(tail_buf, tail_capacity)
4   # ... [1]
5
6   push_err_no = pn_transport_push(read) # see [Q1]
7
8   if (push_err_no  0)
9 socket_shutdown(SHUTDOWN_READ)
10  end if
11
12  # ... [2]
13
14  head_pending = pn_transport_pending() # see [Q2]
15  if(head_pending  0)
16head_buf = pn_transport_head()
17written = socket_send(head_buf, head_pending)
18# ... [3]
19
20pn_transport_pop(written)
21  else if (head_pending  0)
22socket_shutdown(SHUTDOWN_WRITE)
23  end if

Elided sections:
[1] A well-behaved application would call pn_transport_close_tail() if
socket_recv()  0
[2] Application makes use of top half API - pn_session_head(),
pn_work_head() etc
[3] A well-behaved application would call pn_transport_close_head() if
socket_send()  0


=== Questions about error handling ===

Imagine that the bytes read from the socket on line 3 represent a valid
frame followed by a frame that is invalid (e.g. because it contains a field
of an unexpected datatype). In this case:

[Q1] Should pn_transport_push return -1 on line 6, thereby signalling that
the application can't push any more bytes into it?

[Q2] On lines 14-21, what is in the transport's outgoing byte queue? We
expect that it would be:
  frame1 corresponding to top-half API calls on line 12
  frame2 ... 
  ...
  the CLOSE frame triggered by the invalid input.

Or maybe the CLOSE frame somehow replaces the other outgoing frames in the
transport's outgoing byte queue?

Note: if the application supports failover, it would subsequently unbind
the transport, create a new socket, create a new transport, and bind the
existing connection to it.


Phil



On 28 March 2013 16:25, Rafael Schloming r...@alum.mit.edu wrote:

 On Thu, Mar 28, 2013 at 11:16 AM, Phil Harvey p...@philharveyonline.com
 wrote:

  On 28 March 2013 13:17, Rafael Schloming r...@alum.mit.edu wrote:
 
   On Thu, Mar 28, 2013 at 5:31 AM, Rob Godfrey rob.j.godf...@gmail.com
   wrote:
  
On 28 March 2013 02:45, Rafael Schloming r...@alum.mit.edu wrote:
   
 On Wed, Mar 27, 2013 at 6:34 PM, Rob Godfrey 
  rob.j.godf...@gmail.com
 wrote:

  On 27 March 2013 21:16, Rafael Schloming r...@alum.mit.edu
 wrote:
   On Wed, Mar 27, 2013 at 11:53 AM, Keith W 
 keith.w...@gmail.com
 wrote:
  
 
  [..snip..]
  
 
  [..snip..]
 
  
   To answer your question, say there is a framing error/the-wire is cut
   (really there isn't any way to know the difference since you could cut
  the
   wire half way through a frame header), the transport interface will
 write
   out the close frame as required by the spec, and it will indicate
 through
   it's error interface that an error has occurred, however it won't alter
  any
   of the local/remote states of the top half endpoints. The local states
   remain reflective of the local apps desired state, and the remote
 states
   remain reflective of the remote apps last known desired state. This
 kind
  of
   has to be this way because you don't want to confuse links being
   involuntarily detached because the wire was cut with the remote
 endpoint
   wanting to actively shut down the link.
  
 
  I find this a little confusing.  After the Transport has silently sent
 the
  Close frame, what would the local Application typically do next in order
 to
  get the Engine back to a usable state?
 

 It would unbind the transport. When the transport is unbound, all the
 remote state is cleared, and the app is free to use the connection/endpoint
 data structure as if it had simply built them up into their current state
 explicitly via constructors as opposed to it being the result of network
 interactions.


 
  There is an alternative approach which I would find simpler.  In this
  alternative. the Engine would not implicitly send the Close frame.
  Instead, the Application would explicitly control this by doing the
  following:
 
  - The Application checks the Transport's error state as usual
  - The Application discovers that the Transport is in an error state and
  therefore calls
 Connection.setCondition(errorDetailsObtainedFromTransport)
  followed by Connection.close()
  - The Application calls Transport.output (or pn_transport_head in
  proton-c), causing the Close frame bytes to be produced.
 
  As a Proton Engine developer I would find this simpler to implement.
 

 I'm 

Re: Defining the behaviour of Proton Engine API under error conditions

2013-03-28 Thread Rob Godfrey
On 28 March 2013 02:45, Rafael Schloming r...@alum.mit.edu wrote:

 On Wed, Mar 27, 2013 at 6:34 PM, Rob Godfrey rob.j.godf...@gmail.com
 wrote:

  On 27 March 2013 21:16, Rafael Schloming r...@alum.mit.edu wrote:
   On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com
 wrote:
  
 
  [..snip..]
 
  
   3. How will the application using top half API know an error has
   occured? What are the application's responsibilities when it learns of
   an error?
  
  
   The transport has an error variable which can be inspected to get more
   information about what has happened. Also, if/when the transport is
  unbound
   from the connection, all of the remote endpoint states will transition
  back
   to UNINIT.
  
   I'm not sure how to answer what the applications responsibilities are.
  That
   seems to depend on the application. It could just decide to shutdown
 with
   an error message or it could decide to employ some sort of retry
  strategy,
   e.g. connect to a backup service, create a new transport, and bind it
 to
   the same top half. I'm not sure I'd say it has any hard and fast
   responsibilities per/se though.
  
  
 
  I think the point Keith was trying to make here is that in order for a
  component to be compliant with the AMQP specification, anywhere where
  the spec says The connection/session/link should be
  closed/ended/detached with the X error, is it the application that
  actually has responsibility for setting the local error state and
  calling the close() on the appropriate endpoint.  If it is we are
  placing a burden on the application authors to preserve AMQP
  compliance.
 

 Ah, that makes sense. Thanks for clarifying. I don't recall offhand all the
 places the spec says close/end/detach with an X error, but in situations
 like framing errors or missing/invalid arguments required by the transport
 to properly maintain state, say an out-of-sequence delivery-id, that is
 something the transport would handle automatically. I expect there may be
 some cases that the top half would need to initiate explicitly though, e.g.
 obviously a redirect error would need to be initiated by the top half.


So when you say the transport handles automatically, how does that work
from the app's perspective?  Is the transport updating the local state of
the endpoint, or the remote state... if the transport is actually injecting
the close performative into the outgoing stream (the detach, end, or close)
then it seems somewhat weird if the local app still sees this endpoint as
locally open.. on the other hand it is also weird if the local state is now
a shared responsibility between the app and the transport.

Also what about cases where the error is in the opening performative for
the endpoint (open,begin,attach)... if the way to communicate the error is
by responding with the paired closing performative, this may first require
the sending of an opening... For example, if there is an error in an
incoming (unsolicited) open frame - say the container id is null - then the
only way to communicate this back to the other side is to send a valid open
immediately followed bby a close frame with the appropriate error code.
Are you saying that the transport would do all this silently without this
being visible to the app?

-- Rob


 --Rafael



Re: Defining the behaviour of Proton Engine API under error conditions

2013-03-28 Thread Phil Harvey
On 28 March 2013 13:17, Rafael Schloming r...@alum.mit.edu wrote:

 On Thu, Mar 28, 2013 at 5:31 AM, Rob Godfrey rob.j.godf...@gmail.com
 wrote:

  On 28 March 2013 02:45, Rafael Schloming r...@alum.mit.edu wrote:
 
   On Wed, Mar 27, 2013 at 6:34 PM, Rob Godfrey rob.j.godf...@gmail.com
   wrote:
  
On 27 March 2013 21:16, Rafael Schloming r...@alum.mit.edu wrote:
 On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com
   wrote:

   
[..snip..]


[..snip..]


 To answer your question, say there is a framing error/the-wire is cut
 (really there isn't any way to know the difference since you could cut the
 wire half way through a frame header), the transport interface will write
 out the close frame as required by the spec, and it will indicate through
 it's error interface that an error has occurred, however it won't alter any
 of the local/remote states of the top half endpoints. The local states
 remain reflective of the local apps desired state, and the remote states
 remain reflective of the remote apps last known desired state. This kind of
 has to be this way because you don't want to confuse links being
 involuntarily detached because the wire was cut with the remote endpoint
 wanting to actively shut down the link.


I find this a little confusing.  After the Transport has silently sent the
Close frame, what would the local Application typically do next in order to
get the Engine back to a usable state?

There is an alternative approach which I would find simpler.  In this
alternative. the Engine would not implicitly send the Close frame.
Instead, the Application would explicitly control this by doing the
following:

- The Application checks the Transport's error state as usual
- The Application discovers that the Transport is in an error state and
therefore calls Connection.setCondition(errorDetailsObtainedFromTransport)
followed by Connection.close()
- The Application calls Transport.output (or pn_transport_head in
proton-c), causing the Close frame bytes to be produced.

As a Proton Engine developer I would find this simpler to implement.

Moreover, as a Proton Engine user, it gives me a clearer separation of
responsibility between the application and the Engine.

Maybe I'm just not grokking the Engine's philosophy, but on the whole the
Engine API feels like it gives me control over what frames are being
produced (though not *when* - and that's fine by me), so I find the idea of
the Transport layer silently sending a Close rather surprising.

What are people's views on this?

[..snip..]


 --Rafael


Phil


Re: Defining the behaviour of Proton Engine API under error conditions

2013-03-28 Thread Rafael Schloming
On Thu, Mar 28, 2013 at 11:16 AM, Phil Harvey p...@philharveyonline.comwrote:

 On 28 March 2013 13:17, Rafael Schloming r...@alum.mit.edu wrote:

  On Thu, Mar 28, 2013 at 5:31 AM, Rob Godfrey rob.j.godf...@gmail.com
  wrote:
 
   On 28 March 2013 02:45, Rafael Schloming r...@alum.mit.edu wrote:
  
On Wed, Mar 27, 2013 at 6:34 PM, Rob Godfrey 
 rob.j.godf...@gmail.com
wrote:
   
 On 27 March 2013 21:16, Rafael Schloming r...@alum.mit.edu wrote:
  On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com
wrote:
 

 [..snip..]
 

 [..snip..]

 
  To answer your question, say there is a framing error/the-wire is cut
  (really there isn't any way to know the difference since you could cut
 the
  wire half way through a frame header), the transport interface will write
  out the close frame as required by the spec, and it will indicate through
  it's error interface that an error has occurred, however it won't alter
 any
  of the local/remote states of the top half endpoints. The local states
  remain reflective of the local apps desired state, and the remote states
  remain reflective of the remote apps last known desired state. This kind
 of
  has to be this way because you don't want to confuse links being
  involuntarily detached because the wire was cut with the remote endpoint
  wanting to actively shut down the link.
 

 I find this a little confusing.  After the Transport has silently sent the
 Close frame, what would the local Application typically do next in order to
 get the Engine back to a usable state?


It would unbind the transport. When the transport is unbound, all the
remote state is cleared, and the app is free to use the connection/endpoint
data structure as if it had simply built them up into their current state
explicitly via constructors as opposed to it being the result of network
interactions.



 There is an alternative approach which I would find simpler.  In this
 alternative. the Engine would not implicitly send the Close frame.
 Instead, the Application would explicitly control this by doing the
 following:

 - The Application checks the Transport's error state as usual
 - The Application discovers that the Transport is in an error state and
 therefore calls Connection.setCondition(errorDetailsObtainedFromTransport)
 followed by Connection.close()
 - The Application calls Transport.output (or pn_transport_head in
 proton-c), causing the Close frame bytes to be produced.

 As a Proton Engine developer I would find this simpler to implement.


I'm not sure I follow why this would be any simpler to implement.



 Moreover, as a Proton Engine user, it gives me a clearer separation of
 responsibility between the application and the Engine.

 Maybe I'm just not grokking the Engine's philosophy, but on the whole the
 Engine API feels like it gives me control over what frames are being
 produced (though not *when* - and that's fine by me), so I find the idea of
 the Transport layer silently sending a Close rather surprising.


As a user of the top half, I don't think you should need to be at all aware
of what frames are/aren't sent by the engine. You should only need to think
about the semantics that the top half of the API provide. So I would ask
you, why do you care what frames are/aren't sent by the engine? It seems to
me that all you should care about are the intentions of the remote
application, e.g. did the remote application intend to open/close a link,
session, or connection.

I think it is key to understand the dual nature of the open/close,
begin/end, and attach/detach frames as defined by the protocol They are
used to express two classes of things, i.e. they are frames that combine
information from multiple distinct layers. They are used in a purely
structural/framing/formatting level to manage on-the-wire constructs such
as frame sizes, channel numbers, handles, etc. Concepts that never need to
leak outside the transport. They are also used at a higher semantic level,
e.g. to express an application's intent to establish a link. Not every use
of one of these frames involves both elements, for example if a very small
number of channels is negotiated by the transport, but the app wants to use
a lot of sessions, then the engine implementation might well
detach/reattach sessions on demand in order to share the limited number of
available channels. This would never be visible to the top half, and many
attach/detach frames would be sent without the application doing anything
to trigger it.

I think the particular case of a framing-error style close is somewhat
similar. Mechanically/structurally the close frame is required to conform
to the spec, but in such a case it is not expressing the intent of the
*application* to close the connection, it is expressing the intent of the
*transport* to close the connection.



 What are people's views on this?


The spec doesn't allow you to do anything other than send a close frame
with an appropriate 

Defining the behaviour of Proton Engine API under error conditions

2013-03-27 Thread Keith W
Hi all

Phil and I are tasked with producing a comprehensive set of system
tests for Proton Engine.

The aim is to produce a test suite that will execute against all
Proton implementations thus guaranteeing that all exhibit identical
behaviour and assuring conformance with the AMQP 1-0 specification.

This work has highlighted the need to define how Proton Engine API
behaves under error conditions.

To start a discussion, below we have identified below six different
types of error and posed a number of questions regarding the behaviour
of the Engine API. Thoughts?

Regards, Keith.

Background
==

We have identified two sources of test-cases:

A) AMQP specification (parts 1 and 2).  For example, the spec states
(2.4.1) the open frame can only be sent on channel 0 suggesting a
test case should exercise the path where Proton receives an Open on a
channel number other than 0, and similarly (2.4.2) [The Close] frame
MUST be the last thing ever written onto a connection suggests a test
case where Proton receives another frame after the receipt of Close.

B) Proton-API itself suggests test-cases. For example, what if a user
tries to bind a connection to more than one transport, opens too many
channels, or calls connection close on a connection that has already
been closed.


Error conditions


Numbers 1-4 relate to the bottom half (i.e. the transport I/O functions):

1) bytes that do not conform to AMQP 1.0 Part 1 [e.g. inconsistent size/doff]

2) bytes that while constituting a valid frame (conforms to Part
2.3.1) are an invalid AMQP frame (violates Part 2.3.2) [e.g.
frame-body containing a primitive string rather than performative]

3) bytes that constitute a valid AMQP frame (conforms to Part 2.3.2) but:
 3A) performative is malformed [e.g. field with unexpected type or
mandatory with value null]
 3B) performative with additional fields [e.g. a Close with additional fields ]
 3C) frame that breaks a AMQP business rule [e.g. Open received on
non-zero channel]

4) state error [e.g. Begin performative received before Open, Attach
on unknown channel number etc]

Numbers 5-6 relate to the top half (i.e. the functions relating to
Connection, Session, etc):

5) illegal parameters to a method call [e.g.
pn_connection_set_container with null container name]
6) illegal state [e.g. pn_connection_open called twice, pn_session
called on unopened connection, pn_session called too many times etc]

Questions
=

When the bottom half encounters input characterised by 1-4, how does
the botton-half of the API behave? What is the effect on the top half?

1. Will the bottom half continue to accept more input?
2. Will the botton-half continue to produce output?
3. How will the application using top half API know an error has
occured? What are the application's responsibilities when it learns of
an error?
4. If a connection is already opened, how (if at all) does the
presense of the error condition affect the connection?

When the top half used in a manner characterised by 5-6, how does the
top half behave?  What, if any, is the affect on the bottom half?


Re: Defining the behaviour of Proton Engine API under error conditions

2013-03-27 Thread Rafael Schloming
On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com wrote:

 Hi all

 Phil and I are tasked with producing a comprehensive set of system
 tests for Proton Engine.

 The aim is to produce a test suite that will execute against all
 Proton implementations thus guaranteeing that all exhibit identical
 behaviour and assuring conformance with the AMQP 1-0 specification.

 This work has highlighted the need to define how Proton Engine API
 behaves under error conditions.

 To start a discussion, below we have identified below six different
 types of error and posed a number of questions regarding the behaviour
 of the Engine API. Thoughts?

 Regards, Keith.

 Background
 ==

 We have identified two sources of test-cases:

 A) AMQP specification (parts 1 and 2).  For example, the spec states
 (2.4.1) the open frame can only be sent on channel 0 suggesting a
 test case should exercise the path where Proton receives an Open on a
 channel number other than 0, and similarly (2.4.2) [The Close] frame
 MUST be the last thing ever written onto a connection suggests a test
 case where Proton receives another frame after the receipt of Close.

 B) Proton-API itself suggests test-cases. For example, what if a user
 tries to bind a connection to more than one transport, opens too many
 channels, or calls connection close on a connection that has already
 been closed.


 Error conditions
 

 Numbers 1-4 relate to the bottom half (i.e. the transport I/O functions):

 1) bytes that do not conform to AMQP 1.0 Part 1 [e.g. inconsistent
 size/doff]

 2) bytes that while constituting a valid frame (conforms to Part
 2.3.1) are an invalid AMQP frame (violates Part 2.3.2) [e.g.
 frame-body containing a primitive string rather than performative]

 3) bytes that constitute a valid AMQP frame (conforms to Part 2.3.2) but:
  3A) performative is malformed [e.g. field with unexpected type or
 mandatory with value null]
  3B) performative with additional fields [e.g. a Close with additional
 fields ]
  3C) frame that breaks a AMQP business rule [e.g. Open received on
 non-zero channel]

 4) state error [e.g. Begin performative received before Open, Attach
 on unknown channel number etc]

 Numbers 5-6 relate to the top half (i.e. the functions relating to
 Connection, Session, etc):

 5) illegal parameters to a method call [e.g.
 pn_connection_set_container with null container name]
 6) illegal state [e.g. pn_connection_open called twice, pn_session
 called on unopened connection, pn_session called too many times etc]


One thing that pops out here is your comment about it being an error to
call pn_session() on an unopened connection. I believe this may indicate a
misunderstanding of a key property of the design.

The top half represents endpoint state. Connections, sessions, links, and
deliveries, are all just data structures. These data structures can be
built up in one of two ways, either directly through the top half API by
calling the various constructors: pn_session(), pn_sender(), pn_receiver(),
pn_delivery(), or they can be built up by binding a transport object to a
connection and then feeding bytes into that transport object. Now for the
bytes that are fed into that transport, there certainly is a constraint
that you can't send a begin frame without already having sent an open
frame, and likewise you can't send an attach frame without already sending
a begin frame, however these constraints are part of the protocol
definition, and have no bearing with how the same endpoint data structures
can be constructed directly through the top half API.

It is perfectly valid to construct a connection, session, and link, create
deliveries on them, supply data for those deliveries, even update and/or
settle those deliveries without ever opening any of the containing
connection/session/links, or indeed without ever binding a transport. It is
the job of the engine to figure out how to translate the current endpoint
state into a valid sequence of protocol primitives. So, for example, if a
transport is bound to connection with a whole lot of open sessions and
links, but the connection itself isn't open yet, the transport will simply
not send any frames because there is nothing it can legally send until the
connection is opened.



 Questions
 =


So your questions below are best answered in the context of the new
transport interface, so I'm going to describe that a bit first:

 +-+  +-+  +-+
 | |  Input   | |  Tail| |
 | |-| |-| |
 |   Socket|  |   Driver|  |  Transport  |
 | |-| |-| |
 | |  Output  | |  Head| |
 +-+  +-+  +-+

If you recall, conceptually we have the 

Re: Defining the behaviour of Proton Engine API under error conditions

2013-03-27 Thread Rafael Schloming
On Wed, Mar 27, 2013 at 4:16 PM, Rafael Schloming r...@alum.mit.edu wrote:

 1. Will the bottom half continue to accept more input?


 In a way this is kind of unimportant to specify. With the way the new
 transport interface works, the driver will read anywhere from 0 up to
 capacity bytes into the transport's tail. Depending on how network reads
 end up being fragmented, this could be end up being a large amount of
 garbage data, a little amount of garbage data, or some amount of good data
 with garbadge somewhere in the middle. In all cases the transport will end
 up going into an error state, possibly writing out some number of detach
 and end frames, almost certainly writing out a close frame with some kind
 of helpful debugging info in it, and then indicating that there will never
 be anymore pending bytes available by returning PN_EOS from
 pn_transport_pending.


I realize I didn't quite finish my thought here. The reason it is
unimportant to specify is that the transport accepts data in chunks. This
is because we don't want the driver to ever have to buffer data on it's
own, so when the transport reports it has capacty X to the driver, it has
pretty much already guaranteed to process whatever bytes the driver gives
it up to X amount. It can't really decide to stop after the first few
bytes because they were malformed, it pretty much needs to process the
rest of the garbage by just throwing it away. So a transport implementation
could decide to keep on accepting more garbage bytes until it has written
out it's close frame with it's helpful debugging info, or it could decide
to stop accepting input and leave it to the driver to either discard the
garbage on its own, or do a shutdown of the socket input.

I would say the most friendly thing to do here in a debugging context would
probably be to keep on accepting and throwing away the garbage until the
helpful close frame is written to the wire. In a production environment
it's possible that a more aggressive strategy might be preferred.

--Rafael


Re: Defining the behaviour of Proton Engine API under error conditions

2013-03-27 Thread Rafael Schloming
On Wed, Mar 27, 2013 at 6:34 PM, Rob Godfrey rob.j.godf...@gmail.comwrote:

 On 27 March 2013 21:16, Rafael Schloming r...@alum.mit.edu wrote:
  On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com wrote:
 

 [..snip..]

 
  3. How will the application using top half API know an error has
  occured? What are the application's responsibilities when it learns of
  an error?
 
 
  The transport has an error variable which can be inspected to get more
  information about what has happened. Also, if/when the transport is
 unbound
  from the connection, all of the remote endpoint states will transition
 back
  to UNINIT.
 
  I'm not sure how to answer what the applications responsibilities are.
 That
  seems to depend on the application. It could just decide to shutdown with
  an error message or it could decide to employ some sort of retry
 strategy,
  e.g. connect to a backup service, create a new transport, and bind it to
  the same top half. I'm not sure I'd say it has any hard and fast
  responsibilities per/se though.
 
 

 I think the point Keith was trying to make here is that in order for a
 component to be compliant with the AMQP specification, anywhere where
 the spec says The connection/session/link should be
 closed/ended/detached with the X error, is it the application that
 actually has responsibility for setting the local error state and
 calling the close() on the appropriate endpoint.  If it is we are
 placing a burden on the application authors to preserve AMQP
 compliance.


Ah, that makes sense. Thanks for clarifying. I don't recall offhand all the
places the spec says close/end/detach with an X error, but in situations
like framing errors or missing/invalid arguments required by the transport
to properly maintain state, say an out-of-sequence delivery-id, that is
something the transport would handle automatically. I expect there may be
some cases that the top half would need to initiate explicitly though, e.g.
obviously a redirect error would need to be initiated by the top half.

--Rafael