Rafi, I agree with the rational behind these changes.

>     /** Return a pointer to a transport's input buffer. This pointer
>     may
>      * change when calls into the engine are made. 

I think we'll need to be a little more liberal with the lifecycle guarantee of 
these buffer pointers.  Drivers based on "completion" models (rather than 
sockets) could be forced to do data copies rather than supplying the pointer 
directly to the completion mechanism.

Could we instead guarantee that pointers (and space) returned from the 
transport remain valid until 1) the transport is released or 2) the 
"push"/"pop" call is made against the transport?

Or perhaps a reference count-based interface would be safer?  Once the driver 
determines there is capacity/pending, it "reserves" the buffer and is required 
to "push"/"pop" to release it?

Oh, and those names absolutely stink. ;)  It's unclear from the function names 
what components of the transport they are affecting.  I'd rather something more 
readable:

pn_transport_capacity() --> pn_transport_input_capacity()
pn_transport_buffer() -> pn_transport_input_buffer()
pn_transport_push() --> pn_transport_input_written()

etc.

-K

----- Original Message -----
> A recent bug (PROTON-222) has exposed an issue with the transport
> interface. The details of the bug are documented in the JIRA, but the
> basic
> issue is that given the current transport interface there is no way
> for an
> application to discover via the engine interface when data has been
> actually written over the wire vs just sitting there waiting to be
> written
> over the wire.
> 
> Note that by "written over the wire" I mean copied from an in-process
> buffer that will vanish as soon as the process exits, to a kernel
> buffer
> that will continue to (re)-transmit the data for as long as the
> machine is
> alive, connected to the network, and the remote endpoint is still
> listening.
> 
> To understand the issue, imagine implementing a driver for the
> current
> transport interface. It might allocate a buffer of 1K and then call
> pn_transport_output to fill that buffer. The transport might then put
> say
> 750 bytes of data into that buffer. Now imagine what happens if the
> driver
> can only write 500 of those bytes to the socket. This will leave the
> driver
> buffering 250 bytes. The engine of course has no way of knowing this
> and
> can only assume that all 750 bytes will get written out.
> 
> A somewhat related issue is the buffering ownership/strategy between
> the
> transport and the driver. There are really three basic choices here:
> 
>   1) the driver owns the buffer, and the transport does no buffering
>   2) the transport owns the buffer, and the driver does no buffering
>   3) they both own their own buffers and we copy from one to the
>   other
> 
> Now the division between these isn't always static, there are hybrid
> strategies and what not, however it's useful to think of these basic
> cases.
> The current transport interface (pn_transport_input/output) and
> initial
> implementation was designed around option (1), the idea behind the
> pn_transport_output signature was that the engine could directly
> encode
> into a driver-owned buffer. This, however, turned out to introduce
> some
> unfriendly coupling. Imagine what happens in our hypothetical
> scenario
> above if the driver has a 1K buffer and the engine negotiates a 4K
> frame
> size. The engine might end up getting stuck with a frame that is too
> large
> to encode directly into the driver's buffer. So to make the interface
> more
> friendly, we modified the implementation to do buffering internally
> if
> necessary, thus ending up in some ways closer to option (3).
> 
> Now the reason this buffering issue is related to PROTON-222 is that
> one
> way to allow the engine to know whether data is buffered or not is to
> redefine the interface around option (2), thereby allowing the engine
> to
> always have visibility into what is/isn't on the wire. This would
> also in
> some cases eliminate some of the extra copying that occurs currently
> due to
> our evolution towards option (3).
> 
> Such an interface would look something like this:
> 
>     // deprecated and internally implemented with capacity, buffer,
>     and push
>     ssize_t pn_transport_input(pn_transport_t *transport, const char
> *bytes, size_t available);
>     // deprecated and internally implemented with pending, peek, and
>     pop
>     ssize_t pn_transport_output(pn_transport_t *transport, char
>     *bytes,
> size_t size);
> 
> 
>     /** Report the amount of free space in a transport's input
>     buffer. If
>      * the engine is in an error state or has reached the end of
>      stream
>      * state, a negative value will be returned indication the
>      condition.
>      *
>      * @param[in] transport the transport
>      * @return the free space in the transport
>      */
>     ssize_t pn_transport_capacity(pn_transport_t *transport);
> 
>     /** Return a pointer to a transport's input buffer. This pointer
>     may
>      * change when calls into the engine are made. The amount of
>      space in
>      * this buffer is reported by ::pn_transport_capacity. Calls to
>      * ::pn_transport_push may change the value of this pointer and
>      the
>      * amount of free space reported by ::pn_transport_capacity.
>      *
>      * @param[in] transport the transport
>      * @return a pointer to the transport's input buffer
>      */
>     char *pn_transport_buffer(pn_transport_t *transport);
> 
>     /** Push data from a transport's input buffer into the engine and
>      * return how much data was succesfully pushed.
>      *
>      * @param[in] transport the transport
>      * @param[size] the amount of data in the transport's input
>      buffer
>      * @return the amount of data consumed
>      */
>     size_t pn_transport_push(pn_transport_t *transport, size_t size);
> 
> 
>     /** Return the number of pending output bytes for a transport, or
>     a
>      * negative error code if the engine is in an error state.
>      *
>      * @param[in] the transport
>      * @return the number of pending output bytes, or an error code
>      */
>     ssize_t pn_transport_pending(pn_transport_t *transport);
> 
>     /** Return a pointer to a transport's output buffer. Any calls
>     into
>      * the engine may change the value of this pointer and it's
>      contents.
>      *
>      * @param[in] the transport
>      * @return a pointer to the transport's output buffer
>      */
>     const char *pn_transport_peek(pn_transport_t *transport);
> 
>     /** Remove bytes from the head of the output buffer for a
>     transport.
>      *
>      * @param[in] the transport
>      * @param[size] the number of bytes to remove
>      */
>     void pn_transport_pop(pn_transport_t *transport, size_t size);
> 
> This new interface basically models the transport as a queue of
> bytes. On
> the input side, pn_transport_input is decomposed into three separate
> parts:
> capacity, buffer, and push. On the output side, pn_transport_output
> is
> decomposed into three separate parts: pending, peek, and pop. These
> changes
> would simplify the current driver implementation as it would no
> longer need
> to do it's own buffering, and usage of the peek/pop style interface
> on the
> output side would be a much safer interface for driver's in general
> and
> allow the engine to know what has/hasn't made it onto the wire. To
> revisit
> our hypothetical scenario, the driver no longer needs to bother with
> its 1K
> buffer, it simply queries the transport for pending bytes, offers
> them all
> to the socket write, and then pops only those bytes that have
> actually been
> written.
> 
> This also moves the transport interface firmly into the camp of
> option (2).
> It's somewhat unavoidable that the transport needs to do it's own
> buffering, particularly when you consider things like SSL, and
> exposing
> that fact at least allows the option of a driver that avoids the
> copy. It
> should be noted though that there are some reasons that a driver may
> want
> to own it's own buffers, and in these cases an extra copy will be
> necessary, resulting again in something that looks like option (3).
> There
> are ways we might address this in the future by providing hooks into
> the
> allocation that the transport does, however I don't believe that
> affects
> the part of the interface presented above.
> 
> Please have a look over this and let me know what your thoughts are.
> I'd
> appreciate some comments from the Java folks on how to address this
> problem
> for the Java transport interface. I don't think the buffer management
> needs
> to be identical to the C as the whole buffer ownership problem is a
> bit
> different with garbage collection (e.g. it's less of a problem to
> create
> buffers in one component and pass ownership to another component),
> however
> I believe the interface does need to address the same thing that the
> peek/pop portion of this proposal is addressing, i.e. adding some
> means of
> confirming when the bytes are actually "on the wire".
> 
> --Rafael
> 

Reply via email to