A recent bug (PROTON-222) has exposed an issue with the transport
interface. The details of the bug are documented in the JIRA, but the basic
issue is that given the current transport interface there is no way for an
application to discover via the engine interface when data has been
actually written over the wire vs just sitting there waiting to be written
over the wire.

Note that by "written over the wire" I mean copied from an in-process
buffer that will vanish as soon as the process exits, to a kernel buffer
that will continue to (re)-transmit the data for as long as the machine is
alive, connected to the network, and the remote endpoint is still listening.

To understand the issue, imagine implementing a driver for the current
transport interface. It might allocate a buffer of 1K and then call
pn_transport_output to fill that buffer. The transport might then put say
750 bytes of data into that buffer. Now imagine what happens if the driver
can only write 500 of those bytes to the socket. This will leave the driver
buffering 250 bytes. The engine of course has no way of knowing this and
can only assume that all 750 bytes will get written out.

A somewhat related issue is the buffering ownership/strategy between the
transport and the driver. There are really three basic choices here:

  1) the driver owns the buffer, and the transport does no buffering
  2) the transport owns the buffer, and the driver does no buffering
  3) they both own their own buffers and we copy from one to the other

Now the division between these isn't always static, there are hybrid
strategies and what not, however it's useful to think of these basic cases.
The current transport interface (pn_transport_input/output) and initial
implementation was designed around option (1), the idea behind the
pn_transport_output signature was that the engine could directly encode
into a driver-owned buffer. This, however, turned out to introduce some
unfriendly coupling. Imagine what happens in our hypothetical scenario
above if the driver has a 1K buffer and the engine negotiates a 4K frame
size. The engine might end up getting stuck with a frame that is too large
to encode directly into the driver's buffer. So to make the interface more
friendly, we modified the implementation to do buffering internally if
necessary, thus ending up in some ways closer to option (3).

Now the reason this buffering issue is related to PROTON-222 is that one
way to allow the engine to know whether data is buffered or not is to
redefine the interface around option (2), thereby allowing the engine to
always have visibility into what is/isn't on the wire. This would also in
some cases eliminate some of the extra copying that occurs currently due to
our evolution towards option (3).

Such an interface would look something like this:

    // deprecated and internally implemented with capacity, buffer, and push
    ssize_t pn_transport_input(pn_transport_t *transport, const char
*bytes, size_t available);
    // deprecated and internally implemented with pending, peek, and pop
    ssize_t pn_transport_output(pn_transport_t *transport, char *bytes,
size_t size);


    /** Report the amount of free space in a transport's input buffer. If
     * the engine is in an error state or has reached the end of stream
     * state, a negative value will be returned indication the condition.
     *
     * @param[in] transport the transport
     * @return the free space in the transport
     */
    ssize_t pn_transport_capacity(pn_transport_t *transport);

    /** Return a pointer to a transport's input buffer. This pointer may
     * change when calls into the engine are made. The amount of space in
     * this buffer is reported by ::pn_transport_capacity. Calls to
     * ::pn_transport_push may change the value of this pointer and the
     * amount of free space reported by ::pn_transport_capacity.
     *
     * @param[in] transport the transport
     * @return a pointer to the transport's input buffer
     */
    char *pn_transport_buffer(pn_transport_t *transport);

    /** Push data from a transport's input buffer into the engine and
     * return how much data was succesfully pushed.
     *
     * @param[in] transport the transport
     * @param[size] the amount of data in the transport's input buffer
     * @return the amount of data consumed
     */
    size_t pn_transport_push(pn_transport_t *transport, size_t size);


    /** Return the number of pending output bytes for a transport, or a
     * negative error code if the engine is in an error state.
     *
     * @param[in] the transport
     * @return the number of pending output bytes, or an error code
     */
    ssize_t pn_transport_pending(pn_transport_t *transport);

    /** Return a pointer to a transport's output buffer. Any calls into
     * the engine may change the value of this pointer and it's contents.
     *
     * @param[in] the transport
     * @return a pointer to the transport's output buffer
     */
    const char *pn_transport_peek(pn_transport_t *transport);

    /** Remove bytes from the head of the output buffer for a transport.
     *
     * @param[in] the transport
     * @param[size] the number of bytes to remove
     */
    void pn_transport_pop(pn_transport_t *transport, size_t size);

This new interface basically models the transport as a queue of bytes. On
the input side, pn_transport_input is decomposed into three separate parts:
capacity, buffer, and push. On the output side, pn_transport_output is
decomposed into three separate parts: pending, peek, and pop. These changes
would simplify the current driver implementation as it would no longer need
to do it's own buffering, and usage of the peek/pop style interface on the
output side would be a much safer interface for driver's in general and
allow the engine to know what has/hasn't made it onto the wire. To revisit
our hypothetical scenario, the driver no longer needs to bother with its 1K
buffer, it simply queries the transport for pending bytes, offers them all
to the socket write, and then pops only those bytes that have actually been
written.

This also moves the transport interface firmly into the camp of option (2).
It's somewhat unavoidable that the transport needs to do it's own
buffering, particularly when you consider things like SSL, and exposing
that fact at least allows the option of a driver that avoids the copy. It
should be noted though that there are some reasons that a driver may want
to own it's own buffers, and in these cases an extra copy will be
necessary, resulting again in something that looks like option (3). There
are ways we might address this in the future by providing hooks into the
allocation that the transport does, however I don't believe that affects
the part of the interface presented above.

Please have a look over this and let me know what your thoughts are. I'd
appreciate some comments from the Java folks on how to address this problem
for the Java transport interface. I don't think the buffer management needs
to be identical to the C as the whole buffer ownership problem is a bit
different with garbage collection (e.g. it's less of a problem to create
buffers in one component and pass ownership to another component), however
I believe the interface does need to address the same thing that the
peek/pop portion of this proposal is addressing, i.e. adding some means of
confirming when the bytes are actually "on the wire".

--Rafael

Reply via email to