Re: Defining the behaviour of Proton Engine API under error conditions
Thanks for the response. It does clarify the Engine's semantics, and the intended division of responsibility between the Engine and the application. I intend to document this soon in a short conceptual summary under proton/docs/engine/. I chatted to Keith about this and we're uncertain about some of the details of the steps that follow an invalid frame being pushed into the Engine. To illustrate this, we wrote the following pseudo-code for the main loop of a typical application (eg a messaging client), similar to Driver's pn_connector_process function. 1 tail_buf = pn_transport_tail() 2 tail_capacity = pn_transport_capacity() 3 read = socket_recv(tail_buf, tail_capacity) 4 # ... [1] 5 6 push_err_no = pn_transport_push(read) # see [Q1] 7 8 if (push_err_no 0) 9 socket_shutdown(SHUTDOWN_READ) 10 end if 11 12 # ... [2] 13 14 head_pending = pn_transport_pending() # see [Q2] 15 if(head_pending 0) 16head_buf = pn_transport_head() 17written = socket_send(head_buf, head_pending) 18# ... [3] 19 20pn_transport_pop(written) 21 else if (head_pending 0) 22socket_shutdown(SHUTDOWN_WRITE) 23 end if Elided sections: [1] A well-behaved application would call pn_transport_close_tail() if socket_recv() 0 [2] Application makes use of top half API - pn_session_head(), pn_work_head() etc [3] A well-behaved application would call pn_transport_close_head() if socket_send() 0 === Questions about error handling === Imagine that the bytes read from the socket on line 3 represent a valid frame followed by a frame that is invalid (e.g. because it contains a field of an unexpected datatype). In this case: [Q1] Should pn_transport_push return -1 on line 6, thereby signalling that the application can't push any more bytes into it? [Q2] On lines 14-21, what is in the transport's outgoing byte queue? We expect that it would be: frame1 corresponding to top-half API calls on line 12 frame2 ... ... the CLOSE frame triggered by the invalid input. Or maybe the CLOSE frame somehow replaces the other outgoing frames in the transport's outgoing byte queue? Note: if the application supports failover, it would subsequently unbind the transport, create a new socket, create a new transport, and bind the existing connection to it. Phil On 28 March 2013 16:25, Rafael Schloming r...@alum.mit.edu wrote: On Thu, Mar 28, 2013 at 11:16 AM, Phil Harvey p...@philharveyonline.com wrote: On 28 March 2013 13:17, Rafael Schloming r...@alum.mit.edu wrote: On Thu, Mar 28, 2013 at 5:31 AM, Rob Godfrey rob.j.godf...@gmail.com wrote: On 28 March 2013 02:45, Rafael Schloming r...@alum.mit.edu wrote: On Wed, Mar 27, 2013 at 6:34 PM, Rob Godfrey rob.j.godf...@gmail.com wrote: On 27 March 2013 21:16, Rafael Schloming r...@alum.mit.edu wrote: On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com wrote: [..snip..] [..snip..] To answer your question, say there is a framing error/the-wire is cut (really there isn't any way to know the difference since you could cut the wire half way through a frame header), the transport interface will write out the close frame as required by the spec, and it will indicate through it's error interface that an error has occurred, however it won't alter any of the local/remote states of the top half endpoints. The local states remain reflective of the local apps desired state, and the remote states remain reflective of the remote apps last known desired state. This kind of has to be this way because you don't want to confuse links being involuntarily detached because the wire was cut with the remote endpoint wanting to actively shut down the link. I find this a little confusing. After the Transport has silently sent the Close frame, what would the local Application typically do next in order to get the Engine back to a usable state? It would unbind the transport. When the transport is unbound, all the remote state is cleared, and the app is free to use the connection/endpoint data structure as if it had simply built them up into their current state explicitly via constructors as opposed to it being the result of network interactions. There is an alternative approach which I would find simpler. In this alternative. the Engine would not implicitly send the Close frame. Instead, the Application would explicitly control this by doing the following: - The Application checks the Transport's error state as usual - The Application discovers that the Transport is in an error state and therefore calls Connection.setCondition(errorDetailsObtainedFromTransport) followed by Connection.close() - The Application calls Transport.output (or pn_transport_head in proton-c), causing the Close frame bytes to be produced. As a Proton Engine developer I would find this simpler to implement. I'm
Re: Defining the behaviour of Proton Engine API under error conditions
On 28 March 2013 02:45, Rafael Schloming r...@alum.mit.edu wrote: On Wed, Mar 27, 2013 at 6:34 PM, Rob Godfrey rob.j.godf...@gmail.com wrote: On 27 March 2013 21:16, Rafael Schloming r...@alum.mit.edu wrote: On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com wrote: [..snip..] 3. How will the application using top half API know an error has occured? What are the application's responsibilities when it learns of an error? The transport has an error variable which can be inspected to get more information about what has happened. Also, if/when the transport is unbound from the connection, all of the remote endpoint states will transition back to UNINIT. I'm not sure how to answer what the applications responsibilities are. That seems to depend on the application. It could just decide to shutdown with an error message or it could decide to employ some sort of retry strategy, e.g. connect to a backup service, create a new transport, and bind it to the same top half. I'm not sure I'd say it has any hard and fast responsibilities per/se though. I think the point Keith was trying to make here is that in order for a component to be compliant with the AMQP specification, anywhere where the spec says The connection/session/link should be closed/ended/detached with the X error, is it the application that actually has responsibility for setting the local error state and calling the close() on the appropriate endpoint. If it is we are placing a burden on the application authors to preserve AMQP compliance. Ah, that makes sense. Thanks for clarifying. I don't recall offhand all the places the spec says close/end/detach with an X error, but in situations like framing errors or missing/invalid arguments required by the transport to properly maintain state, say an out-of-sequence delivery-id, that is something the transport would handle automatically. I expect there may be some cases that the top half would need to initiate explicitly though, e.g. obviously a redirect error would need to be initiated by the top half. So when you say the transport handles automatically, how does that work from the app's perspective? Is the transport updating the local state of the endpoint, or the remote state... if the transport is actually injecting the close performative into the outgoing stream (the detach, end, or close) then it seems somewhat weird if the local app still sees this endpoint as locally open.. on the other hand it is also weird if the local state is now a shared responsibility between the app and the transport. Also what about cases where the error is in the opening performative for the endpoint (open,begin,attach)... if the way to communicate the error is by responding with the paired closing performative, this may first require the sending of an opening... For example, if there is an error in an incoming (unsolicited) open frame - say the container id is null - then the only way to communicate this back to the other side is to send a valid open immediately followed bby a close frame with the appropriate error code. Are you saying that the transport would do all this silently without this being visible to the app? -- Rob --Rafael
Re: Defining the behaviour of Proton Engine API under error conditions
On 28 March 2013 13:17, Rafael Schloming r...@alum.mit.edu wrote: On Thu, Mar 28, 2013 at 5:31 AM, Rob Godfrey rob.j.godf...@gmail.com wrote: On 28 March 2013 02:45, Rafael Schloming r...@alum.mit.edu wrote: On Wed, Mar 27, 2013 at 6:34 PM, Rob Godfrey rob.j.godf...@gmail.com wrote: On 27 March 2013 21:16, Rafael Schloming r...@alum.mit.edu wrote: On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com wrote: [..snip..] [..snip..] To answer your question, say there is a framing error/the-wire is cut (really there isn't any way to know the difference since you could cut the wire half way through a frame header), the transport interface will write out the close frame as required by the spec, and it will indicate through it's error interface that an error has occurred, however it won't alter any of the local/remote states of the top half endpoints. The local states remain reflective of the local apps desired state, and the remote states remain reflective of the remote apps last known desired state. This kind of has to be this way because you don't want to confuse links being involuntarily detached because the wire was cut with the remote endpoint wanting to actively shut down the link. I find this a little confusing. After the Transport has silently sent the Close frame, what would the local Application typically do next in order to get the Engine back to a usable state? There is an alternative approach which I would find simpler. In this alternative. the Engine would not implicitly send the Close frame. Instead, the Application would explicitly control this by doing the following: - The Application checks the Transport's error state as usual - The Application discovers that the Transport is in an error state and therefore calls Connection.setCondition(errorDetailsObtainedFromTransport) followed by Connection.close() - The Application calls Transport.output (or pn_transport_head in proton-c), causing the Close frame bytes to be produced. As a Proton Engine developer I would find this simpler to implement. Moreover, as a Proton Engine user, it gives me a clearer separation of responsibility between the application and the Engine. Maybe I'm just not grokking the Engine's philosophy, but on the whole the Engine API feels like it gives me control over what frames are being produced (though not *when* - and that's fine by me), so I find the idea of the Transport layer silently sending a Close rather surprising. What are people's views on this? [..snip..] --Rafael Phil
Re: Defining the behaviour of Proton Engine API under error conditions
On Thu, Mar 28, 2013 at 11:16 AM, Phil Harvey p...@philharveyonline.comwrote: On 28 March 2013 13:17, Rafael Schloming r...@alum.mit.edu wrote: On Thu, Mar 28, 2013 at 5:31 AM, Rob Godfrey rob.j.godf...@gmail.com wrote: On 28 March 2013 02:45, Rafael Schloming r...@alum.mit.edu wrote: On Wed, Mar 27, 2013 at 6:34 PM, Rob Godfrey rob.j.godf...@gmail.com wrote: On 27 March 2013 21:16, Rafael Schloming r...@alum.mit.edu wrote: On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com wrote: [..snip..] [..snip..] To answer your question, say there is a framing error/the-wire is cut (really there isn't any way to know the difference since you could cut the wire half way through a frame header), the transport interface will write out the close frame as required by the spec, and it will indicate through it's error interface that an error has occurred, however it won't alter any of the local/remote states of the top half endpoints. The local states remain reflective of the local apps desired state, and the remote states remain reflective of the remote apps last known desired state. This kind of has to be this way because you don't want to confuse links being involuntarily detached because the wire was cut with the remote endpoint wanting to actively shut down the link. I find this a little confusing. After the Transport has silently sent the Close frame, what would the local Application typically do next in order to get the Engine back to a usable state? It would unbind the transport. When the transport is unbound, all the remote state is cleared, and the app is free to use the connection/endpoint data structure as if it had simply built them up into their current state explicitly via constructors as opposed to it being the result of network interactions. There is an alternative approach which I would find simpler. In this alternative. the Engine would not implicitly send the Close frame. Instead, the Application would explicitly control this by doing the following: - The Application checks the Transport's error state as usual - The Application discovers that the Transport is in an error state and therefore calls Connection.setCondition(errorDetailsObtainedFromTransport) followed by Connection.close() - The Application calls Transport.output (or pn_transport_head in proton-c), causing the Close frame bytes to be produced. As a Proton Engine developer I would find this simpler to implement. I'm not sure I follow why this would be any simpler to implement. Moreover, as a Proton Engine user, it gives me a clearer separation of responsibility between the application and the Engine. Maybe I'm just not grokking the Engine's philosophy, but on the whole the Engine API feels like it gives me control over what frames are being produced (though not *when* - and that's fine by me), so I find the idea of the Transport layer silently sending a Close rather surprising. As a user of the top half, I don't think you should need to be at all aware of what frames are/aren't sent by the engine. You should only need to think about the semantics that the top half of the API provide. So I would ask you, why do you care what frames are/aren't sent by the engine? It seems to me that all you should care about are the intentions of the remote application, e.g. did the remote application intend to open/close a link, session, or connection. I think it is key to understand the dual nature of the open/close, begin/end, and attach/detach frames as defined by the protocol They are used to express two classes of things, i.e. they are frames that combine information from multiple distinct layers. They are used in a purely structural/framing/formatting level to manage on-the-wire constructs such as frame sizes, channel numbers, handles, etc. Concepts that never need to leak outside the transport. They are also used at a higher semantic level, e.g. to express an application's intent to establish a link. Not every use of one of these frames involves both elements, for example if a very small number of channels is negotiated by the transport, but the app wants to use a lot of sessions, then the engine implementation might well detach/reattach sessions on demand in order to share the limited number of available channels. This would never be visible to the top half, and many attach/detach frames would be sent without the application doing anything to trigger it. I think the particular case of a framing-error style close is somewhat similar. Mechanically/structurally the close frame is required to conform to the spec, but in such a case it is not expressing the intent of the *application* to close the connection, it is expressing the intent of the *transport* to close the connection. What are people's views on this? The spec doesn't allow you to do anything other than send a close frame with an appropriate
Defining the behaviour of Proton Engine API under error conditions
Hi all Phil and I are tasked with producing a comprehensive set of system tests for Proton Engine. The aim is to produce a test suite that will execute against all Proton implementations thus guaranteeing that all exhibit identical behaviour and assuring conformance with the AMQP 1-0 specification. This work has highlighted the need to define how Proton Engine API behaves under error conditions. To start a discussion, below we have identified below six different types of error and posed a number of questions regarding the behaviour of the Engine API. Thoughts? Regards, Keith. Background == We have identified two sources of test-cases: A) AMQP specification (parts 1 and 2). For example, the spec states (2.4.1) the open frame can only be sent on channel 0 suggesting a test case should exercise the path where Proton receives an Open on a channel number other than 0, and similarly (2.4.2) [The Close] frame MUST be the last thing ever written onto a connection suggests a test case where Proton receives another frame after the receipt of Close. B) Proton-API itself suggests test-cases. For example, what if a user tries to bind a connection to more than one transport, opens too many channels, or calls connection close on a connection that has already been closed. Error conditions Numbers 1-4 relate to the bottom half (i.e. the transport I/O functions): 1) bytes that do not conform to AMQP 1.0 Part 1 [e.g. inconsistent size/doff] 2) bytes that while constituting a valid frame (conforms to Part 2.3.1) are an invalid AMQP frame (violates Part 2.3.2) [e.g. frame-body containing a primitive string rather than performative] 3) bytes that constitute a valid AMQP frame (conforms to Part 2.3.2) but: 3A) performative is malformed [e.g. field with unexpected type or mandatory with value null] 3B) performative with additional fields [e.g. a Close with additional fields ] 3C) frame that breaks a AMQP business rule [e.g. Open received on non-zero channel] 4) state error [e.g. Begin performative received before Open, Attach on unknown channel number etc] Numbers 5-6 relate to the top half (i.e. the functions relating to Connection, Session, etc): 5) illegal parameters to a method call [e.g. pn_connection_set_container with null container name] 6) illegal state [e.g. pn_connection_open called twice, pn_session called on unopened connection, pn_session called too many times etc] Questions = When the bottom half encounters input characterised by 1-4, how does the botton-half of the API behave? What is the effect on the top half? 1. Will the bottom half continue to accept more input? 2. Will the botton-half continue to produce output? 3. How will the application using top half API know an error has occured? What are the application's responsibilities when it learns of an error? 4. If a connection is already opened, how (if at all) does the presense of the error condition affect the connection? When the top half used in a manner characterised by 5-6, how does the top half behave? What, if any, is the affect on the bottom half?
Re: Defining the behaviour of Proton Engine API under error conditions
On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com wrote: Hi all Phil and I are tasked with producing a comprehensive set of system tests for Proton Engine. The aim is to produce a test suite that will execute against all Proton implementations thus guaranteeing that all exhibit identical behaviour and assuring conformance with the AMQP 1-0 specification. This work has highlighted the need to define how Proton Engine API behaves under error conditions. To start a discussion, below we have identified below six different types of error and posed a number of questions regarding the behaviour of the Engine API. Thoughts? Regards, Keith. Background == We have identified two sources of test-cases: A) AMQP specification (parts 1 and 2). For example, the spec states (2.4.1) the open frame can only be sent on channel 0 suggesting a test case should exercise the path where Proton receives an Open on a channel number other than 0, and similarly (2.4.2) [The Close] frame MUST be the last thing ever written onto a connection suggests a test case where Proton receives another frame after the receipt of Close. B) Proton-API itself suggests test-cases. For example, what if a user tries to bind a connection to more than one transport, opens too many channels, or calls connection close on a connection that has already been closed. Error conditions Numbers 1-4 relate to the bottom half (i.e. the transport I/O functions): 1) bytes that do not conform to AMQP 1.0 Part 1 [e.g. inconsistent size/doff] 2) bytes that while constituting a valid frame (conforms to Part 2.3.1) are an invalid AMQP frame (violates Part 2.3.2) [e.g. frame-body containing a primitive string rather than performative] 3) bytes that constitute a valid AMQP frame (conforms to Part 2.3.2) but: 3A) performative is malformed [e.g. field with unexpected type or mandatory with value null] 3B) performative with additional fields [e.g. a Close with additional fields ] 3C) frame that breaks a AMQP business rule [e.g. Open received on non-zero channel] 4) state error [e.g. Begin performative received before Open, Attach on unknown channel number etc] Numbers 5-6 relate to the top half (i.e. the functions relating to Connection, Session, etc): 5) illegal parameters to a method call [e.g. pn_connection_set_container with null container name] 6) illegal state [e.g. pn_connection_open called twice, pn_session called on unopened connection, pn_session called too many times etc] One thing that pops out here is your comment about it being an error to call pn_session() on an unopened connection. I believe this may indicate a misunderstanding of a key property of the design. The top half represents endpoint state. Connections, sessions, links, and deliveries, are all just data structures. These data structures can be built up in one of two ways, either directly through the top half API by calling the various constructors: pn_session(), pn_sender(), pn_receiver(), pn_delivery(), or they can be built up by binding a transport object to a connection and then feeding bytes into that transport object. Now for the bytes that are fed into that transport, there certainly is a constraint that you can't send a begin frame without already having sent an open frame, and likewise you can't send an attach frame without already sending a begin frame, however these constraints are part of the protocol definition, and have no bearing with how the same endpoint data structures can be constructed directly through the top half API. It is perfectly valid to construct a connection, session, and link, create deliveries on them, supply data for those deliveries, even update and/or settle those deliveries without ever opening any of the containing connection/session/links, or indeed without ever binding a transport. It is the job of the engine to figure out how to translate the current endpoint state into a valid sequence of protocol primitives. So, for example, if a transport is bound to connection with a whole lot of open sessions and links, but the connection itself isn't open yet, the transport will simply not send any frames because there is nothing it can legally send until the connection is opened. Questions = So your questions below are best answered in the context of the new transport interface, so I'm going to describe that a bit first: +-+ +-+ +-+ | | Input | | Tail| | | |-| |-| | | Socket| | Driver| | Transport | | |-| |-| | | | Output | | Head| | +-+ +-+ +-+ If you recall, conceptually we have the
Re: Defining the behaviour of Proton Engine API under error conditions
On Wed, Mar 27, 2013 at 4:16 PM, Rafael Schloming r...@alum.mit.edu wrote: 1. Will the bottom half continue to accept more input? In a way this is kind of unimportant to specify. With the way the new transport interface works, the driver will read anywhere from 0 up to capacity bytes into the transport's tail. Depending on how network reads end up being fragmented, this could be end up being a large amount of garbage data, a little amount of garbage data, or some amount of good data with garbadge somewhere in the middle. In all cases the transport will end up going into an error state, possibly writing out some number of detach and end frames, almost certainly writing out a close frame with some kind of helpful debugging info in it, and then indicating that there will never be anymore pending bytes available by returning PN_EOS from pn_transport_pending. I realize I didn't quite finish my thought here. The reason it is unimportant to specify is that the transport accepts data in chunks. This is because we don't want the driver to ever have to buffer data on it's own, so when the transport reports it has capacty X to the driver, it has pretty much already guaranteed to process whatever bytes the driver gives it up to X amount. It can't really decide to stop after the first few bytes because they were malformed, it pretty much needs to process the rest of the garbage by just throwing it away. So a transport implementation could decide to keep on accepting more garbage bytes until it has written out it's close frame with it's helpful debugging info, or it could decide to stop accepting input and leave it to the driver to either discard the garbage on its own, or do a shutdown of the socket input. I would say the most friendly thing to do here in a debugging context would probably be to keep on accepting and throwing away the garbage until the helpful close frame is written to the wire. In a production environment it's possible that a more aggressive strategy might be preferred. --Rafael
Re: Defining the behaviour of Proton Engine API under error conditions
On Wed, Mar 27, 2013 at 6:34 PM, Rob Godfrey rob.j.godf...@gmail.comwrote: On 27 March 2013 21:16, Rafael Schloming r...@alum.mit.edu wrote: On Wed, Mar 27, 2013 at 11:53 AM, Keith W keith.w...@gmail.com wrote: [..snip..] 3. How will the application using top half API know an error has occured? What are the application's responsibilities when it learns of an error? The transport has an error variable which can be inspected to get more information about what has happened. Also, if/when the transport is unbound from the connection, all of the remote endpoint states will transition back to UNINIT. I'm not sure how to answer what the applications responsibilities are. That seems to depend on the application. It could just decide to shutdown with an error message or it could decide to employ some sort of retry strategy, e.g. connect to a backup service, create a new transport, and bind it to the same top half. I'm not sure I'd say it has any hard and fast responsibilities per/se though. I think the point Keith was trying to make here is that in order for a component to be compliant with the AMQP specification, anywhere where the spec says The connection/session/link should be closed/ended/detached with the X error, is it the application that actually has responsibility for setting the local error state and calling the close() on the appropriate endpoint. If it is we are placing a burden on the application authors to preserve AMQP compliance. Ah, that makes sense. Thanks for clarifying. I don't recall offhand all the places the spec says close/end/detach with an X error, but in situations like framing errors or missing/invalid arguments required by the transport to properly maintain state, say an out-of-sequence delivery-id, that is something the transport would handle automatically. I expect there may be some cases that the top half would need to initiate explicitly though, e.g. obviously a redirect error would need to be initiated by the top half. --Rafael