[Alternative Subject line: TProtocol::readBinary(vector<uint8_t>& buf) ?]

I realize this is a bit of a long one, but I really wanted to make sure I had covered all bases before writing code.

To recap:
I'm locked into using vector<uint8_t> for binary blobs in my application, because a const vector<uint8_t>& is what the code above and below the RPC is expecting. But with good reasons: These are typically on a critical path, so we want low latency; they involve packet payloads, and thus, we need them to be contiguously addressable. They usually come straight to or from socket syscalls, or even libpcap.
   So, we don't use std::string for representing them. [-1]
Thrift's current implementations of TProtocol::readBinary() are likely to introduce copies in this case, because they use std::string.

Bruce Simpson wrote:
It's likely I'd also need to take a similar approach for the read path, my stub generator currently has to bounce buffer.

The TProtocol::readBinary() method probably needs a little bit more thinking about, and I'll come back to that -- I notice TBinaryProtocol::readBinary() is taking pains not to abuse 'auto' (stack) class storage.

I want to avoid intermediate copies if at all possible, without breaking Thrift's design assumption that TTransport has serial (ie not random access) semantics. This would be desirable for strings, also, especially if they get big.

In my situation, the caller generally wants to manage, or at least instantiate, the buffer which the TProtocol instance reads into. This is in order to avoid the cost of those intermediate copies, which can quickly mount up when packets are being sent rapidly.

Here's that example again from XORP's Thrift code generator:

Example:
[given 's' as a vector<uint8_t> argument]
   string tmp_s;
   nin += inp->readBinary(tmp_s);
   s.resize(tmp_s.size());
   memcpy(&s[0], tmp_s.data(), tmp_s.size());

In the above example, including the transport, there are going to be up to *5* copies of the same data for an in-flight RPC, not including kernel socket buffers.

Let's count them:
   1 for a TMemoryBuffer managed by my own TTransport class,
   1 for its socket read buffer,
   1 for the private buffer used by TBinaryProtocol itself,
1 for the std::string temporary which TBinaryProtocol::readBinary() assigns to [0], and finally,
   1 in the vector<uint8_t> I want to read the blob into.

Reasons being:
There is no getting away from the TMemoryBuffer, as my operations are async in the client (and potentially need one in the server too if we thread-pool), although I could consume it once the message is sent. I can't get away from using a socket read buffer, because of the need to re-acquire message boundaries from the stream, when TCP is used. SCTP wouldn't have this problem, but that's academic right now. This cost is part of TNonblockingServer anyway, although I am using my own TTransport-derived class. The std::string temporary in TBinaryProtocol::readBinary() is probably going to get elided by the fact it's an lvalue, but perhaps we should do what TCompactProtocol does and be explicit about assignment, to benefit from any library optimizations. [0]

In all cases of TProtocol derived classes, the size of the blob being marshaled is known upfront, so a reallocation often isn't needed if a buffer is already to hand. This optimization already exists in TBinaryProtocol for strings.

For TJSONProtocol, there is no avoiding intermediate buffering of readBinary(), as base64 decode needs to happen. We could 'peel off' the str instance by reading into it, and overwriting each byte as we decode it, which would eliminate another copy; as that method is implemented in the SVN trunk, it's easier to append to str.

More background:
Part of the problem is that std::string has no notion of shared storage, at the level of its API. [1]. TBinaryProtocol::readBinary(std::string& str) tries to avoid allocating stack temporaries. It uses a buffer, managed with the C heap allocator, private to the class instance. [2] The std::string constructor called to assign to the 'str' argument will cause the default allocator to be called, to manage its own copy of this buffer; another intermediate copy, as std::string has no knowledge of this buffer's allocator semantics. [3] [4]

Without hacking allocators, or adding another method to the library: it seems hard, to my mind, to come up with a compiler and library agnostic solution to my problem, that of coaxing Thrift to read into a vector<uint8_t> without further copying.

Reasons being:
* Whilst vector<T> has similar methods to std::string for free store management, exploiting the method similarity would require a template specialization. No way, jose -- that's largely academic, we are not doing template methods in TProtocol. * I looked around to see if there were any constructor tricks I could use to trick vector<T> into using pre-allocated storage, or easily casting to it without a copy. No dice. [5] * I'd just end up reinventing shared_ptr<T>; vector<T> has no knowledge of how to share and refcount storage, unless you explicitly tell it to. My application doesn't know about shared_ptr<T> yet, and using this would mean changing our app's existing code, something I'm keen to avoid given its size.


[on uint8_t/size_t vs vector<T>]
This seems a more reasonable approach than limiting it to the use of std::vector<T>; type safety is side-stepped for a very specific reason. A 'gmake' runs to completion with this patch on FreeBSD 7.2-STABLE/amd64.

So it seems, to deal with my need for vector<uint8_t> in my existing application's RPC methods, without introducing unnecessary allocations or copies on the receive path. this boils down to two design choices:
1.  Just go ahead and implement it in libthrift itself.
2. Add a helper method in my application's IPC library, specific to TBinaryProtocol, to read the blob from the TTransport directly.

Approach #1:
   Thrift is already using <vector> anyway, but not for T = uint8_t.
The additional cost of vector<uint8_t> usage in the library is zero for static linkage, if the linker knows to discard the template expansions from the text segment [6]. However, shared linkage is going introduce weak symbols for vector<uint8_t> methods due to template expansion, and thus add a small amount of runtime text cost to libthrift.so. If a limit is placed on the payload size upfront, things are a little easier, if say we wanted to implement TProtocol::readBinary(uint8_t* data, size_t& len), without requiring vector<uint8_t>. It is a more generic interface, but still the problem looms of how to resize the buffer if input is bigger than we expect, assuming the caller manages the buffer. Probably not a good idea to be using C-style buffer management here. Even so, there is no easy way to convert a uint8_t array to a vector without a copy [5].

So it might be best just to add TProtocol::readBinary(vector<uint8_t>& buf) to the library.

Approach #2:
On the other hand, if we hard-wire everything in our app to TBinaryProtocol, and explicitly add a helper method, the problem of doing it generically in the Thrift C++ client library, libthrift.so, goes away. I already have a few for encapsulating XORP XRL transport exceptions as TApplicationExceptions. (I did it this way mainly to avoid spamming the translated IDL with exception {} decls.) It would be useful at some point later on to have access to the other protocols, but TBinaryProtocol probably works best for us.

This means less flexibility for our app, but doesn't introduce vector<uint8_t> to libthrift.so. I would prefer to solve the problem generally, and push a patch upstream, so others can benefit.

At times like this, I wish C++ had Python's buffer management.
On the other hand, this seems like a more general problem: i.e. that of avoiding intermediate copies on the critical path, whilst preserving TTransport's serial access semantics.

I'm sure others may also need to get data in and out of Thrift like this, and an option to stereotype a byte[] array as having these semantics in the generated C++ code may be useful.

Comments? Suggestions?

regards,
BMS

[-1] Before Thrift, there is probably intermediate copying going on. Long story. [0] TCompactProtocol::readBinary() is more explicit, it uses std::string::assign() on the instance passed to it. [1] Although implementations can, and often do, implement tricks inside the C++ standard library, to share storage between instances of std::string which are associated with each other by assignment or construction; GNU libstdc++ refcounts the backing store for its implementation. [2] Folk paying heed to thread safety will note that TBinaryProtocol::readBinary() has no re-entrancy guarantee, because of its private buffer; regardless of whether the C/C++ heap allocator is re-entrant or not. [3] Once again: there is no const std::string specialization, so no opportunity for the compiler to elide the allocation, and just emit a reference to const storage. [4] In this case, because not all of the objects involved are instances of std::string, we can't benefit from optimization in [1], and even if they were, this optimization is implementation-defined.
[5] Same as for [3]; a const vector<T> gets no special treatment.
[6] LLVM will potentially be very smart at this later on.

Reply via email to